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Technical computing today is increasingly dominated by design and analysis 
tasks that require high-performance workstation and software products. Some 
of the products described in this issue address the needs of this emerging 
market. 

On the software side, we have the DirectModel 3D modeling toolkit and the 
HP implementation of the OpenGL" graphics standard. The toolkit provides 
application developers with the capability to develop applications that can 
construct 3D models containing millions or billions of polygons. DirectModel 
is built on top of the HP OpenGL product OpenGL is a vendor-neutral, multi- 
platform, industry-standard application programming interface (API) for 
developing 2D and 3D visual applications. 

For running these applications, we have the HP Kayak PC-based workstation 
running the Windows - NT operating system. HP Kayak provides world-leading 
3D graphics performance typically found in high-end UNIX- workstations. 
Much of the hardware architecture for HP Kayak is based on the VISUALIZE 
fx 4 graphics accelerator, which is designed to provide native acceleration for 
the OpenGL API. 

A common theme underlying the development of all these products is the 
desire to shorten the time to market Concurrent engineering was employed 
in the OpenGL project to achieve this goal. Processes done in serial were 
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modified to be done in parallel, shortening the product development 
cycle. Quality engineers at the HP Kobe Instrument Division reengi- 
neered their quality assurance process to deal with the time-to-market 
issue and still maintain high-quality released software. 

We have two articles about HP-UX workstations. One describes a fea- 
ture that allows multiple monitors to be configured as one contiguous 
viewing space, and the other discusses the challenges of adding the 
Peripheral Component Interconnect, or PCI, to HP B-class and C-class 
workstations. 

Information is the fuel that drives today's enterprises. Thus, we have 
three articles that discuss the use of information to do such tasks as 
linking business manufacturing software to the factory floor, providing 
a knowledge database for support personnel, and forecasting compo- 
nent demand in material planning. 

The article about HP VEE (Visual Engineering Environment! is an exam- 
ple of our new publishing paradigm of using the web to extend or com- 
plement what appears in the printed version of the Hewlett-Packard 
Journal. 

C. L. Leath 
Managing Editor 
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In August we w ill have articles about a 
150-MHz-bancl\vi<ltIi membrane hydro- 
phone, units nieasuremenl for optical 
instruments, and efforts to improve the 
reliability of ceramic pin grid array pack- 
aging and surface-mount LEDs. We will 
also have articles from the HP Design 
Technology Conference, the HP Com- 
pression Conference, and the HP Elec- 
tronic- and Assembly Conference. 
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Articles 



An API for Interfacing Interactive 
3D Applications to High-Speed 
Graphics Hardware 

Kevin T. Lefebvre and John M. Brown 

An introduction to the articles in this 
issue that describe the HP hardware and 
software products that implement or 
support the OpenGL" specification 

CjT) The Fast-Break Program 



An Overview of the HP OpenGL 
Software Architecture 



Kevin T. Lefebvre, Robert J. Casey, Michael 
J. Phelps, Courtney D. Goeltzenleuchter, 
and Donley B. Hoffman 

The features in the software component 
of the HP OpenGL product that differ- 
entiate it from other OpenGL implemen- 
tations include performance, quality, and 
reliability. 




The DirectModel Toolkit: 
Meeting the 3D Graphics Needs 
of Technical Applications 



Brian E. Cripe and Thomas A. Gaskins 

Today's highly complex mechanical design 
automation systems require a modelling 
toolkit for developing interactive applica- 
tions capable of handling 3D models con- 
taining millions or billions of polygons. 



An Overview of the VISUALIZE fx 
Graphics Accelerator Hardware 



Noel D. Scott, Daniel M. Olsen, and Ethan W. 
Gannett 

Five custom integrated circuits make up 
the liigh-speed VISUALIZE fx family of 
graphics subsystems. 

(30) Occlusion Culling 
(32) Fast Virtual Texturing 




HP Kayak: A PC Workstation with 
Advanced Graphics Performance 



Ross A.Cunniff 

Graphics performance typically found 
in high-speed I'NIX 1 ' workstations has 
been incorporated into a PC workstation 
running the Windows " NT environment. 




Concurrent Engineering in 
OpenGL's Product Development 

Robert J. Casey and L. Leonard Lindstone 

The authors describe how the concepts 
of concurrent engineering helped the HP 
OpenGL project to achieve a shorter time 
to market and a reduction in rework. 



Advanced Display Technologies 
on HP-UX Workstations 

Todd M. Spencer, Paul M. Anderson, and 
David Sweetser 

Recent versions of the HP-UX operating 
system contain features that allow users 
to create more viewing space by configur- 
ing multiple monitors into a single logical 
screen. 
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^ftP Delivering PCI m HP B-Class and 
C-Class Workstations: A Case 
Study in the Challenges of 
Interfacing with Industry 
Standards 

Ric L. Lewis, Erin A. Handgen. Nicholas J. 
Ingegneri, and Glen T. Robinson 

The authors discuss some of Hie challenges 
involved in incorporating an industry-stan- 
dard I/O subsystem into IIP workstations. 




A Theoretical Derivation of 
Relationships between Forecast 
Errors 




Kenn S. Jennyc 

IIP Enterprise Link Is a middleware soft- 
ware product that allows business manage 
ment applications to exchange informa- 
tion With applications running on I he 
factory floor. 



Knowledge Harvest 
Articulation, and Delivery 

Kemal A. Delic and Dominique Lahaix 

A knowledge-based software tool is used 
to help HP support personnel provide 

customer support 



(76) Glossary 



Jerry Z. Shan 

A study of the errors associated with pre- 
dicting component replacement require- 
ments in the materials planning process. 



89 Strengthening Software Quality 
Assurance 



Mutsuhiko Asada and Pong Mang Yan 

Reengineering a software quality assur- 
ance program to deal with shorter time- 
to-markel goals. 



A Compiler for HP VEE 



Steven Greenbaum and Stanley Jefferson 

The authors describe a compiler technol- 
ogy that is designed to improve (tie exe- 
cution speed of I IP VEE (Visual Engineer- 
ing Environment ) programs. 
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An API for Interfacing Interactive 3D 
Applications to High-Speed Graphics 
Hardware 



Kevin T. Lefebvre 
John M. Brown 



The OpenGL® specification defines a software interface that can be 
implemented on a wide range of graphics devices ranging from simple 
frame buffers to fully hardware-accelerated geometry processors. 



0, 




Kevin T. Lefebvre 

A senior engineer in the 
graphics products labora- 
tory at I hi" IIP Workstation 
Systems Division, Kevin l.efebvre is responsi- 
ble for the OpeiiGI. architecture and its imple- 
mentation atld delivery. He came to HP ill l!IH(i 
from the Apollo Systems Division. He lias a US 
degree in mathematics ( 1976") from Carncgie- 
Mellon University. He was horn in Pittsficld, 
Massachusetts, is married and lias two cliil- 
riren. His hobbies include running, biking, and 
skiing. 



John M. Brown 

John Brown is a senior 
engineer in the graphics 
products laboratory of the 
IIP Workstation Systems Division. He is respon- 
sible for graphics application performance. 
John came to HP in 1088. He holds a BSF.E 
degree ( HliSUI from the 1 'Diversity of Kentucky. 




pt'itGL is a specification Tor a software-to-hardware application 
programming interface, or API. thai defines operations needed to produce 
interactive 3D applications. Il is designed to be used on a wide range of 
graphics devices, including simple frame buffers and hardware-accelerated 
geometry processor systems. With design goals of efficiency and multiple 
platform support, certain functions, such as windowing and input support, 
have not been defined in OpenGL. These unsupported functions are included 
in support libraries outside the core OpenGL definition. 

OpenGL is targeted for use on a range of new graphics devices for both UNIX - - 
based and Windows " NT-based operating system platforms. These systems 
differ in both capabilities and performance. 

Early in the OpenGL program at IIP. industry partnerships were established 
between the OpenGL R&D labs and key independent software vendors (ISVs) 
to ensure a high-quality, high-performance product Chat met the needs of 
these ISVs. These partnerships were also used to assist the ISVs in moving to 
the HP OpenGL product (see "The Past Break Program" on page 8). 

The various OpenGL articles in this issue describe the design philosophy and 
the implementation of the IIP version of OpenGL and other graphics products 
associated with OpenGL. 
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History of OpenGL 

OpenGL is a successor to Iris GL a graphics library devel- 
oped by Silicon Graphics International (SGI). Major 
changes have been made to the Iris GL specification in 
defining OpenGL. These changes have been aimed at 
making OpenGL a cleaner, more extensible architecture. 

W ith the goal of creating a single open graphics standard, 
the OpenGL Architecture Review Board (ARB ) was formed 
to define the specification and promote OpenGL in terms 
of ISV use and availability of vendor implementations. 
The original ARB members were SGI, Intel, Microsoft 1 , 
Digital Equipment Corporation, and IBM. Evans & Suther- 
land. Intergraph. Sun, and IIP were added more recently. 
For more information on c urrent ARB members. OpenGL 
licensees, frequently-asked questions, and other 
ARB related information, visit the OpenGL web site at 
http://wvvw.opengl.org. 

The initial effort of the ARB was t he 1.0 specification of 
OpenGL, which became av ailable in 1992. Along with 
this specification was a series of conformance tests that 
licensees needed to pass before an implementation could 
be called < )penGL. Since then the ARB has added new 
features and released a 1.1 specification in 1995 (the HP 
implementation is based on 1.1). Work is currently being 
done to define a 1.2 revision of the specification. 

HP Involvement in OpenGL 

HI" became an OpenGL licensee in 1995. We had the goal 
of delivering a native implementation of ' ipenGL that 
would run on hardware and software that would provide 
OpenGL performance leadership. 

Shortly after licensing ( IpenGL, we established a relation- 
ship with a third party to provide an < )penGL implementa- 
tion on our existing set of graphics hardware while we 
worked on a new generation of hardware that was better 
suited for OpenGL semantics. The OpenGL provided by 
the third party used the underlying graphics hardware 
acceleration where possible. However, it could not be 
considered an accelerated implementation of OpenGL 
because of features lacking in the hardware. 

In August of 1996, we demonstrated our first native Imple- 
mentation of OpenGL at Siggraph 96. This implementation 
was fully functional and represented the software that 



would be shipped with the future OpenGL-based hard- 
ware. The implementation supported various device driv- 
ers including a software-based Tenderer. The OpenGL de- 
velopment effort culminated in the announcement and 
delivery - of OpenGL-based systems in the fall of 1997. 

Software Implementation 

In our implementation, we focused on the hardware's abil- 
ity to accelerate major portions of the rendering pipeline. 
For the software, we focused on its ability to ensure that 
the hardware could run at full performance. A fast graphics 
accelerator is not needed if the driving software cannot 
keep die hardware busy. The resulting software architec- 
ture and implementation was designed from a system 
viewpoint. Decisions were based on system requirements 
to avoid overoptimizing each individual component and 
still not achieve the desired results. An overview of the 
HP OpenGL software architecture is provided in the ar- 
ticle on page 9. Another software-related Issue is provided 
in the article on page -'S5, which discusses issues associ- 
ated with porting a UNIX-based OpenGL implementation 
to Windows NT 

Hardware Systems 

The new graphics systems are able to support OpenGL, 
Slarbase, PHIGS, and PRX rendering semantics in hard- 
ware. Being able to support the OpenGL API means that 
there is hardware support for accelerating the full feature 
set of OpenGL instead of just having a simple frame buffer 
in which all or most of the OpenGL features are imple- 
mented in software. These systems ;uc the VLSI ALIZE fx2, 
VISI "ALIZE fx4, and VISUALIZE fx(i graphics accelerator 
products. These systems differ in the amount of graphics 
acceleration they provide, the number of image planes, 
and the optional < )penGL features they provide. In addi- 
tion to the base graphics boards, a texture mapping op- 
tion Is available for the fx4 and fx(i accelerators. The 
article on page 28 provides an overview of the new 
graphics hardware developed to support OpenGL. 

Engineering Process 

To meet the required delivery dates of OpenGL with a 
high level of confidence and quality, we used ;i new pro- 
cess to compress the time between first silicon and manu- 
facturing release. The article on page 41 describes the 
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The Fast-Break Program 



In basketball, a rapid offensive transition is called a fast- 
break. The fast-break program is about the transition game 
for OpenGL on HP systems. A key part of the HP transition to 
OpenGL is applications, because applications enable volume 
shipments ol systems. Having the right applications is neces- 
sary for a successful OpenGL product, but it is also important 
that the applications run with outstanding performance and 
reliability, Ff st-break is about both aspects — getting the appli- 
cations on HP systems and ensuring that they have outstanding 
performance and reliability. 

Fast-break began by working with application developers in 
the early stages of the OpenGL program to understand their 
requirements for the HP OpenGL product. These requirements 
helped to drive the initial OpenGL product definition. 

As the program progressed, the Fast-break team developed a 
suite of tools that enabled detailed analysis of OpenGL appli- 
cations. Analysis of key applications was used to further refine 
our OpenGL product performance and functionality. Analysis 
also yielded a set of synthetic API benchmarks that repre- 
sented the behavior of key applications. These synthetic 
benchmarks enabled HP to perform early hands-on evaluation 
of the OpenGL product long before the actual applications 
were ported to HP. 

Pre-porting laid the groundwork for the actual porting of appli- 
cations to HP's implementation of OpenGL. The first phase of 



the porting took place during the OpenGL beta program. In this 
program, the HP fast-break team worked closely with selected 
application developers to initiate the porting effort. A software- 
only implementation of the OpenGL product was used, which 
enabled the beta program to take place even before hardware 
was available. 

As hardware became available, the beta program was super- 
seded by the early access program. This program included the 
original beta participants and additional selected developers 
In both the beta and early access programs, HP found that the 
homework done earlier by the fast-break team paid big divi- 
dends. Most applications were ported to HP in just a few days 
and, in some cases, just a few hours! 

Although not completely defect-free, these early versions of 
OpenGL were uniformly high-performance and high-quality 
products. By accelerating the application porting effort, HP 
was able to identify and resolve the few remaining issues 
before the product was officially released. 

The ongoing involvement of the fast-break team with the 
OpenGL product development teams helped HP do it right the 
first time by delivering a high-quality, high-performance imple- 
mentation of OpenGL and enabling rapid porting of key appli- 
cations to the HP product. 



engineering proc ss we used to accelerate the lime to 
market for OpenGL. 

Graphics Middleware 

A fasi graphics API is not always enough. Leading edge 
CAD modelling problems far exceed the interactive ca- 
pacity of graphical super workstations. For example, try 
spinning a complete CAD model of a Boeing 777 at 30 
frames per second on any system. 

Wlr I is needed is a new approach to solving the render- 
ing problem of very large models. The goal is t o trade 
off between frame rate, image quality, and system cost. 



HP has introduced a toolkit for use by CAD ISVs to 
assist them in solving this problem. The toolkit is called 
DirectModel and is described on page 19. 

HP-UX Release 1020 and later ana HP-UX 1 1.00 and later On both 32 and 64-bit configura- 
tions! on all HP 9000 computers are Open Group UNIX 95 branded products. 

UNIX is a registered trademark of The Open Group. 

XJOpen is a registered trademark and the X device is a trademark of XJOpen Company I imited 
in the UK and other countries 

Microsoft is a US registered trademark ol Microsoft Corporation 

Windows is a US registered trademark of Microsoft Corporation. 

Silicon Graphics and OpenGL are registered trademarks of Silicon Graphics Inc m the United 
States and other countries. 
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An Overview of the HP OpenGL 5 Software 
Architecture 



Kevin T. Lefebvre 



Robert J. Casey 



m 



ael J. Phelps 



Courtney D. Goellzenleuchter 



Donley B. Hoffman 



OpenGL is a hardware-independent specification of a 3D graphics programmini 
interface. This specification has been implemented on many different vendors' 
platforms with different CPU types and graphics hardware, ranging from 
PC-based board solutions to high-performance workstations. 



T 

Ahe 



.he OpenGL API defines an interface (to graphics hardware) that deals 
entirely with rendering 3D primitives (for example, lines and polygons). The 
HP Implementation of the OpenGL standard does not provide a one-to-one 
mapping between API functions and hardware capabilities. Thus, the software 
component of the HP OpenGL product fills the gaps by mapping API fund ions 
to OpenGL-capable systems. 

Since OpenGL is an industry-standard graphics API. much of the differentiating 
value HP delivers is in performance, quality, reliability, and lime to market. 
The central goal of the HP implementation is to ship more performance and 
quality much sooner. 

What is OpenGL7 

i ipenGL differs from oilier graphics APIs, such as Siarba.se, PHIGS. and PKX 
(PHIGS extension in X), in that it is vertex-based as opposed to primitive- 
based. This means that < ipcnGL provides an interface for supplying a single 
vertex, surface normal, color, or texture coordinate parameter in each call. 
Several of the calls bet ween an OpenGL glBegm and glEnd pair define 
a primitive that Is then rendered. Figure 1 shows a comparison of the 
different API call formats used to render a rectangle. In PHIGS a single call 
could render a primitive by referencing multiple vertices and their associated 
data (such as normals and color) as parameters to the call. This difference in 
procedure calls per primitive (one versus eight for a shaded triangle i posed 
a performance challenge for our implementation. 
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Figure 1 




Graphics API call comparison. 




Starbase 


OpenGL 


polygon3d( ...),- 

\ / 

PEXIih / ^^^^ 


glBegin (GL_QUADS) ; 
glNormal (...); 
glvertexl . . . ) ; 
glNormal (...),- 
yivercexi . . . / ; 
glNormal (...); 
glVertexf . . . ) ; 
glNormal (...); 
glvertexl . . . ) ; 
glEnd ( ) ; 


PEXFillAreaSetWithDatal . 


. ) ; 



An OpenGL implementation consists of the following 
elements: 

■ A rendering library (GL) thai implements the OpenGL 
specification (ihe rendering pipeline) 

■ A utility library (GLU) that implements useful utility 
functions thai are layered on top of OpenGL (for 
example, surfaces, quadratics, and tessellation functions) 

■ An interface to the system's windowing package, includ- 
ing GLX for X Window Systems on the UNIX operating 
system and WGL for Microsoft Windows 

Implementation Goals 

The goals we defined for the OpenGL program that helped 
to shape our implementation were to: 

■ Achieve and sustain long term price/performance leader- 
ship for OpenGL applications miming on HP platforms 

■ Develop a scalable architecture thai supports OpenGL 
on a wide range of HP plat tonus and graphics devices. 

The rest of this article will provide more details about 
our OpenGL implementation and show how these goals 
affected our system design. 

OpenGL API 

In general. OpenGL defines a traditional 3D pipeline for 
rendering 3D primitives. Tliis pipeline takes 3D coordi- 
nates as input, transforms them based on orientation or 
viewpoint, lights the resulting coordinates, and then ren- 
ders them to the frame buffer (Figure 2). 



To implement and control this pipeline, the OpenGL API 
provides two classes of entry points. The first class is 
used to create 3E) geometry as a combination of simple 
primitives such as lines, triangles, and quadrilaterals. 
The entry points thai make up this class are referred to 
as the vertex API, or VAPI. functions. The second class, 
called the state class, manipulates the OpenGL state used 
in the different rendering pipeline stages to define how to 
operate (transform, dip, and so on) on Ihe primitive data. 

VAPI Class 

OpenGL contains a series of entry points thai when used 
together provide a powerful way to build primitives. This 
flexible interface allows an application to provide primi- 
tive data directly from its private data Structures rather 
than requiring it to define structures in terms of what the 
API requires, wliich may not be the formal the application 
requires. 

Primitives are created from a sequence of vertices. These 
vertices can have associated data such as color, surface 
normal, and texture coordinates. These vertices can be 
grouped together and assigned a type, which defines how 
the vertices are connected and how to render the resulting 
primitive. 

The VAPI functions available to define a primitive include 
glVertex (specify its coordinate), glNormal (define a surface 
normal at the coordinate), glColor (assign a color to the 
coordinate), and several others. Each function has several 
forms that indicate Ihe data type of the p.u'ameter ( for 
example, int, short, and float), whether the (.lata is passed 
as a parameter or as a pointer to the dala, and whether 
the data is one-, two-, three-, or four-dimensional. Alto- 
gether there are over 100 VAPI entry points that allow for 
maximum application flexibility in defining primitives. 

The VAPI functions glBegin and glEnd are used to create 
groups of these vertices (and associated data). glBegin 
takes a type parameter that defines the primitive type and 
a count of vertices. The type can be point, line, triangle, 



Figure 2 

Graphics pipeline. 
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triangle strip, quadrilateral, or polygon. Based on the type 
and c ount, the vertices are assembled together as primi- 
tives and sent down the rendering pipeline. 

For added efficiency and to reduce the number of proce- 
dure calls required to render a primitive, vertex arrays 
were added to revision 1.1 of the OpenGL specification. 
Vertex arrays allow an application to define a set of ver- 
tices and associated data before titeir use. After die vertex 
data is defined, one or more rendering calls can be issued 
that reference this data without the additional calls of 
gIBegin. glEnd. or any of the other VAPI calls. 

Finally. OpenGL provides several rendering routines that 
dn not deal with 3D primitives, but rather with rectangular 
areas of pixels. From OpenGL. an application can read, 
copy, or draw pixels to or from any of the OpenGL 
image, depth, or texture buffers. 

Stale Class 

The state class of API functions manipulates the OpenGL 
state machine. The state machine defines how vertices 
are operated on as they pass through the rendering pipe- 
line. There are over 100 functions in this class, each con- 
trolling a different aspect of the pipeline. In OpenGL most 
state information is orthogonal to the type Of primitive 
being operated on. For example, there is a single primitive 
color rather than a specific line color, polygon color, or 
point color. These state manipulation routines can be 
grouped as: 

■ Coordinate transformation 

■ Coloring and lighting 

■ Clipping 

■ Rasterization 

■ Texture mapping 

■ Fog 

■ Modes and execution. 
Pipeline 

Coordinate data (such as vertex, color, and surface nor- 
mal) can come directly from the application, indirectly 
from the application through the use of evaluators, T or 
from a stored display list thai (he application had pre- 
viously created. The coordinates How into the pipeline as 

" fvaluaiuis ate tactions thai derive coordinate information based on parametric curves 
or surfaces and basic functions 



discrete points and are operated on (transformed) individ- 
ually. At a certain point in the pipeline the vertices are 
assembled into primitives, and they are operated on at the 
primitive level (for example, clipping). Next, the primi- 
tives are rasterized into fragments in which operations 
like depth testing occur on each fragment. The final result 
is pixels that are written into the frame buffer. This more 
complex OpenGL pipeline is shown in Figure 3. 

Conceptually, the transform stage takes application- 
specified object-space coordinates and transforms them 
to eye-space coordinates (the space that positions the 
object with respect to the viewer) with a model-view 
matrix. Next, the eye coordinates are projected with a 



Figure 3 
OpenGL pipeline. 
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Figure 4 

Transformation from object-space to window coordinates. 
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projection matrix, divided l>y I he perspective, and then 
transformed by the \iewpon matrix to get them to screen 
space (relative to a window). This process is summarized 
in Figure 4. 

In the lighting stage, a color is computed for each vertex 
based on the lighting state. The lighting state consists of 
a number Of lights, the type of each light (such as posi- 
tional or spotlight ), various parameters of each light (for 
example, position, pointing direction, or color), and the 
materia] properties of the object being lit. The calculation 
takes into consideration, among other things, the light 
stale and the distance of the coordinate to each light, re- 
sulting in a single color for the vertex. 

In rasterization, pixels are written based on the primitive 
type, and the pixel value to be written is based on various 
rasterization states (such as texture mapping enabled, or 
polygon stipple enabled). OpenGL refers to the resulting 
pixel value as a fragment because in addition to the pixel 
value, there is also coverage, depth, and other slate infor- 
mation associated with the fragment. The depth value is 
used to determine the visibility of the pixel as it interacts 
with existing objects in the frame buffer. While the cover- 
age, or alpha, value blends the pixel value with the exist- 
ing value in the frame buffer. 

Software Architecture 

One of the main design goals for the UP OpenGL software 
architecture was to maximize performance where it 
would be most effective. For example, we decided to 
focus on reducing ov erhead to hardware-accelerated 
paths and to base design decisions on application use. 
minimizing the effort and cost required to support future 
system hardware. The resulting architecture is composed 
of two major components: a device-independent module 



;uid a device-specific module. A simple block diagram is 
shown in Figure 5. 

The dispatch component is responsible for handling 
OpenGL API calls and sending them to the appropriate 
receiver. OpenGL can be in one of the following modes: 

■ Protocol mode in which API calls are packaged up and 
forwarded to a remote system for execution 

■ Display list creation mode in which API calls arc stored 
in a display list for later execution 

■ Direct rendering mode in wiiich API calls are intended 
for immediate rendering on the local screen. 



Figure 5 

OpenGL architecture. 
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The primary application path of any importance is the 
immediate rendering path. While in direct rendering mode 
the performance of all functions is important, but the per- 
formance of the VAPI calls is even more critical because 
of the increased frequency of rendering calls over other 
types of calls, like state setting. Airy overhead in transfer- 
ring application rendering commands to the hardware 
reduc es overall performance significantly. See the "System 
Design Results" section in this article on page 14 for a 
discussion on some of these issues. 

The device-independent module is the target for all the 
OpenGL state manipulation calls, and in some situations, 
for VAPI calls such as display list or protocol generation. 
This module contains state management, all system con- 
trol logic, and a complete software implementation of 
the OpenGL rendering pipeline up to the rasterization 
stage, which is used in situations where the hardware 
does not support an OpenGL feature. The device in- 
dependent module is made up of several submodules, 
including: 

■ GLX (OpenGL GLX support module) for handling win- 
dow system dependent components, including context 
management, X Window System interactions, and proto- 
col generation 

■ SUM (system utilities module) for handling system 
dependent components, including system interactions, 
global state management, and memory management 

■ OCM (OpenGL control module) for handling OpenGL 
state management, parameter checking, stale inquiry 
support, and notification of slate changes to the appro- 
priate module 

■ PCM (pipeline control module) for handling graphics 
pipeline control, state validation, and lite software 
rendering pipeline 

■ DLM (display list module) for handling display list 
creation and execulion. 

The device-specific module is basically an abstracted 
hardware interface that resides in a separate shared li- 
brary. Based on what hardware is available, the device-in- 
dependent code dynamically loads the appropriate de- 
vice-specific module. In general the device-specific 
module is . ailed only by the device-independent module, 
never by the API. and converts the requests to hardware- 
specific operations (register loads, operation execute). In 



addition to a device-specific module for the VISUALIZE 
fx series of graphics hardware, there is a virtual memory 
driver device-specific module for handling OpenGL op- 
erations on GLX pixmaps (virtual-memory-based image 
buffers ) or for rendering to hardware that does not sup- 
port OpenGL semantics. 

The final key component of the architecture is stream- 
lines. Streamlines are part of the device-specific module 
but are unique in that they are associated directly with the 
API. On geometry-accelerated devices like the VLSI 'ALIZE 
fx series, the hardware can support the full set of VAPI 
calls. To minimize overhead and maximize performance, 
the calls are targeted to optimized routines that communi- 
cate direc tly with the hardware. In many cases these rou- 
tines are coded in PA RISC 1.1 or PA RISC 2.0 assembly- 
language or C. At initialization time the appropriate rou- 
tines are loaded in the dispatch table based on the system 
type and are dynamically selected at run time. 

An important thing to understand about streamlines is 
that they can only be called when the current stale is 
"clean" and the hardware supports the currenl rendering 
mode. An example of "not clean" is when the viewing 
matrix has been changed, and the hardware needs to be 
updated with the currenl transformation matrix. Because 
the application can make several different calls to manip- 
ulale the matrix, computing the stale based on the view- 
ing matrix and loading I he hardware is deferred until it is 
actually needed. For example, when a primitive is to be 
rendered (initiated via a gIBegin call), the stale is made 
clean (validated) by the device-independent code and sub- 
sequent VAPI calls can be dispatched directly lo the 
streamlines. Another situation in which streamlines can- 
not be called is when the hardware does not support a 
feature, such as t exture mapping in I he VISL* ALIZE fx- 
display hardware. In this situation the VAPI entry points 
do not target the streamlines but rather the device-inde- 
pendent code that implements what is called a general 
path, or in other terms, a software rendering pipeline. 

Three-Process Model 

Under the X Window System on the UNIX operating sys- 
tem, the OpenGL architecture uses a three-process model 
to support the direct and indirect semantics of OpenGL. 
In our implementation, we have leveraged our existing 
direct hardware access (DMA) technology to provide in- 
dustry-leading local rendering performance. This has been 
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Figure 6 

Three-process rendering model. 
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coupled wilh I wo dislincl remote rendering modes, making 
our OpenGL implementation one of the most flexible im- 
plementations in the industry. These rendering modes are 
based upon the three-process rendering model shown in 
Figure 6. This model supports three rendering modes: 
direct, indirect, and virtual. 

Direct Rendering. Hired rendering through DHA provides 
I he highest level of OpenGL performance and is used 
whenever an OpenGL application is connected to a local 
X server running on a workstation with VISUALIZE fx 
graphics hardware. For all but a few operations, the appli- 
cation process communicates directly with the graphics 
hardware, bypassing the interprocess communication 
overhead between the application and the X server. 

Indirect Rendering (Protocol). Indirect rendering is used 
primarily for remote operation when the target X server is 
running on a different workstation than the user apptica- 
tion. In this mode, the OpenGL API library emits GLX 
protocol which is inteipreted by a receiving X server thai 
.supports the GLX extension. The receiving server can be 
HP, Sun Microsystems. Silicon Graphics* International, 
or any oilier X server that supports the GLX server exten- 
sion, In the HP OpenGL implementation, the receiving 
X server passes nearly all GLX protocol directly on to an 
OpenGL daemon process that uses DHA for maximum 
performance. Note that immediate mode rendering per- 
formance through protocol can be severely limited by ftie 
time it takes to send geometric data over the network. 
However, when display lists are used, geometric data is 



cached in the OpenGL daemon and remote OpenGL ren- 
dering can be as fast or .sometimes even faster Hum local 
DHA rendering. 

Virtual Rendering. As a value-added feature, HP OpenGL 
also provides a virtual GL rendering mode not available in 
other OpenGL implementations. Virtual rendering allows 
an OpenGL application to be displayed on any X server or 
X terminal even if the GLX extension is nol supported on 
that serVer. This is accomplished by rendering through the 
virtual memory driver to local memory and then issuing 
the standard XPutlmage protocol to display images on the 
target screen. Although flexible, virtual GL is typically the 
slowest of the OpenGL rendering modes. However, virtual 
GL rendering performance can be increased significantly 
by limiting the size of the output window 

System Design Results 

To deliver industry-leading OpenGL performance, we 
combined graphics hardware, libraries, and drivers. The 
hardware is the core enabler of performance. Although 
the excellence of each part is important, the overall system 
design is even more so. How well the operating system, 
compilers, libraries, drivers, and hardware fit together 
in the system design determines the overall result. We 
worked closely with teams in four IIP R&D labs to opti- 
mize the system design, apply our design values to parti- 
tioning the system, balance performance bottlenecks, and 
simplify the overall architecture and interfaces. The fol- 
lowing section describes some examples of applying our 
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system design principles to the most important aspects 
of 3D graphics applications. 

Improving OpenGL Application Performance 

OpenGL required a radical change from ihe existing 
( legacy ) IIP grapliics APIs, hi analyzing the model for 
our legacy grapliics APIs, we realized thai the same model 
would have considerable overhead for OpenGL, which re- 
quires many more procedure calls. Figure 1 compares Ihe 
c alls required lo generate the same shaded quadrilateral. 

To have a competitive OpenGL, we needed to reduce or 
eliminate function calls and locking overhead. We did I his 
with two system design initiatives called. fast procedure 
calls and implicit device Uu-kiay. 

Fast Procedure Calls. Tun of our laboratories (the Graph- 
ICS Systems Laboratory and the Cupertino Language Labo- 
ratory ) worked together to create a specification for a 
new. faster calling convention for making calls to shared 
library components. This reduced the cost to one-fourth 
the cost of the previous mechanism. 

OpenGL is a stale machine. When the application calls an 
OpenGL function, different things happen depending on 
the current stale. We also wanted to support different de- 
vices with varying degrees of support in the same OpenGL 
library. We needed a dynamic melhod of dispatching API 
function calls lo Ihe correct code lo enable the appropriate 
fimclionality without compromising performance. Given 
this requirement, a naive implementation of OpenGL 
might define each of its API functions like the following: 

void glVertex3fv (const GLfloat *v) 
{ 

switch (context .whichFunction) 
{ 

case HW STREAMLINE: 

HW_STREAMLINE glVertex3f v ( v) ; 

break; 
case GENERAL _PATH : 

GENERAL PATH glVertex3fv( v) ; 

break; 
case GLX_PROTOCOL : 

GLX PROTOCOL glVertex3fv(v) ; 

break; 
case diSPLAY _LIST : 

diSPLAY LIST glVertex3fv(v) ; 

break; 

) 

) 



However, this is a very impractical implementation in 
terms of both performance and software maintainability. 
We decided that the most efficient method of achieving 
this kind of dynamic dispatching was to retarget the API 
function calls ai their source — the application code. Any 
call into a shared library is really a call through a poinler. 
The procedure name that the application calls is associ- 
aled with a particular pointer. Conceptually, what we 
needed was a mechanism lo manage Ihe contents of 
those pointers. To accomplish this, we needed more assis- 
tance from the engineers in the compiler and linker 
groups. 

In simplified terms, the OpenGL library maintains a proce- 
dure link table. Each entry in the procedure link table is 
associated with a particular function name and is com- 
posed of two pointers. One points to the code that is to 
be called, and the other, the link table pointer, points to 
the table used by shared library code (known as PIC. or 
position-independent code) to locale global data. When 
the compiler generates a call to an OpenGL function, it 
loads the appropriate registers with the two fields in the 
associated procedure link table entry and then branches 
to the function, Since OpenGL controls the contents of 
Ihe procedure link table, it can change Ihe contents of 
these fields during execution. This allows OpenGL to 
choose the appropriate code based on the OpenGL state 
dynamically. 

For example, assume that we have a graphics device 
that, except for texture mapping, supports the OpenGL 
pipeline in hardware. In this case Ihe scheduling code 
will find texture mapping enabled (meaning that the 
device cannot handle texture mapping) and choose the 
GENERAL PATH _glVertex3fv code path, which performs soft- 
ware texture mapping. The HW.STREAMLINE. glVertex3fv 
code paths are taken if texture mapping is not enabled. 

Implicit Device Locking Graphics devices are a shared 
system resource. As such, there must be some control 
when an application has access to the graphics devic e so 
that two applications are not attempting lo use the device 
at the same time. Normally Ihe operating system manages 
such shared resources via standard operating system in- 
terfaces (open, close, read, write, and ioctl). 

However, to get Ihe maximum performance possible 
for graphics applications, a user process will access the 
graphics device directly through our:!!) API libraries, 
rather than use Ihe standard operating system interlaces. 
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This means lhal before OpenGL, the HP graphics libraries 
had to assume the (ask of managing shared access to the 
graphics device. 

Before OpenGL, we used a relatively lightweight fasi lock 
at the entry and exit of those library routines thai actually 
accessed the device. With Ihe high frequency of function 
calls in OpenGL. performing this lock and unlock step 
for each function call would exact a severe performance 
penalty, similar to the procedure call problem discussed 
earlier. 

To solve this problem, HP engineers invented a technique 
called implicit device locking. When a process tries to 
access the graphics hardware and does not own the 
device, a virtual memory protection fault exception will 
be generated. The kernel must detect that this protection 
fault was an attempted graphics device access instead of 
a fault from trying to access something like an invalid 
address, a swapped out page, or from doing a copy on a 
write page. 

The graphics fault alerts the syst em that there is another 
process trying to access the graphics device. The kernel 
then makes sure that the graphics device context is saved, 
and the graphics context for the next process is restored. 
After the graphics context switch is complete, the new- 
process is allowed to continue with access to the device. 



and permission is taken away from all oilier processes. 
This allows the current process thai owns the device to 
have zero overhead access. 

This method removes the requirement thai the 3t> graphics 
API library must explicitly loch the graphics device while 
accessing it. This means that Ihe overhead associaled 
with device locking, which was an order of magnitude 
more than with Starbase, is completely eliminated (see 
Figure 7). 

This dramatic improvement in performance is made pos- 
sible by improvements in Ihe HP-UX kernel and careful 
design of the graphics hardware. The basic idea is that 
when multiple graphics applications are running, the 
HP-l 'X kernel will ensure that each application gets its 
fair share of exclusive time to access the graphics device. 

OpenGL was not Ihe only API to benefit from implicit 
locking. The generality of the design allowed us to use 
the same mechanism to eliminate the locking code from 
Starbase as well. Keeping the whole system in mind 
while developing this technology allowed us to expand 
the benefit beyond the original problem — excessive over- 
head from locking for OpenGL 



Figure 7 

State count comparison. 
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Hardware and Software Trade-offs 

Keeping the whole picture in mind allowed us to make 
software and hardware trade-offs to simplify the system 
design. The criteria were based on performance critieal- 
ity, frequency of use, system complexity", and factory cost. 

For example, the hardware was designed to understand 
both OpenGL and Starbase windows. OpenGL requires 
the window origin to be in the lower left comer, whereas 
Starbase requires it to be in the upper left. Putting the 
intelligence in the hardware reduced the overall system 
complexity. 

Nearly all OpenGL features are hardware accelerated. Of 
course, all vertex API formats and dimensions are stream- 
lined and accelerated in hardware for maximum primitive 
performance. Similarly, all fragment pipeline operations 
had to be suppoited in hardware because fragment opera- 
tions l ouch every pixel and software performance would 
not be sufficient. To maximize primitive performance, we 
also hardware-accelerated nearly every geometry pipeline 
feature. For example, all lighting modes, fog modes, and 
arbitrary clip planes are hardware-accelerated. Very few 
OpenGL features are not hardware-accelerated. 

Based on infrequent use and the ability to reasonably ac- 
celerate in software, we implemented the following func- 
tions in software: RasterPos, Selection, Feedback, Indexed 
Lighting, and Indexed Fog. Infrequent use and factory cost 
also encouraged us to implcmenl accumulation buffer 
support in software. (Accumulation is an operation thai 
blends dala between the frame buffer and the accumula- 
tion buffer, allowing effects like motion blur.) 

State Change 

Through systems design we achieved dramatic results in 
application performance by focusing on the design for 
OpenGL state change operations. 

Application graphics performance is a function of both 
primitive and state change (attributes) performance. We 
have designed our OpenGL implementation to maximize 
primit ive performance and minimize the costs of state 
changes. 

State changes include all the function calls that modifj I he 
OpenGL modal slate, including coordinate transformations, 
light itiS state, clipping state, rasterization state, and texture 
State. Stale change does not include primitive calls, pixel 



operations, display list calls, or current state calls. ( ur- 
rent state encompasses all the OpenGL calls that can 
occur either inside or outside gIBeginll and glEndO pairs 
(for example. glColorl), gINormalO. glVertexlli 

There are two classes of state changes: fragment pipeline 
and geometry pipeline. Fragment pipeline state changes 
control the back end. or rasterization stage, of the graphics 
pipeline. This state includes the depth test enable (z-buffer 
hidden surface removal ) and the line stipple definition 
(patterned lines such as dash or dot). Geometry pi peline 
state changes control the front end of the graphics pipe- 
line. This state includes transformation matrices, lighting 
parameters, and front and back culling parameters. Frag- 
ment pipeline state changes are generally less costly than 
geometry pipeline state changes. 

Our systems design focussed on several areas that resulted 
in large application performance gains. We realized that 
the performance of our state change implementation could 
significantly affect application performance. We decided 
that this was important enough to require a redesign of 
the slate change modules and not just tuning. Applying 
these considerations led us to implement immediate and 
deferred validation schemes and provide redundancy 
checks at the beginning of each state change entry point. 

Validation. We implemented different immediate and de- 
ferred validation schemes for different classes of stale 
changes. Geometry pipeline stale changes are handled by 
deferred validation because they lend to be more com- 
plex, requiring massaging of the stale. They are also more 
interlocked because changing one piece of state requires 
modifying another piece of stale (for example, matrix 
c hanges cause changes to the light stale). For us, deferred 
validation resulted in a simple design and increased per- 
formance, reliability, and maintainability. For fragment 
pipeline state changes, we chose immediate validation 
because this slate is relatively simple and noninlerlocked. 

Redundancy Checks, liedundancv checks are done foi all 
OpenGL API calls. Because our analysis showed thai ap 
plications often call slate changing routines with a redun- 
dant stale (for example, new value==current value), we 

" Validation is the mechanism lliat verifies that the current specified state is legal, com- 
putes derived information from the current state necessary for rendering (toi example an 
inverse matrix for lighting Based on the current model matrix), and loads the hardware 
with the new stale 
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wanted a design in which this case performs well. There- 
fore, our design includes redundancy checks at the begin- 
ning of each state change entry point, which allows a quick 
return without exercising I lie unnecessary validation code. 

Results. For stale-change intensive applications, these 
design decisions put us in a leadership position for 
OpenGL application performance, and we achieved 
greater than a 2x performance gain over our previous 
graphics libraries. Smaller application performance gains 
were achieved throughout our OpenGL implementation 
with the State-Change design. 



Conclusion 



ISVs and customers indicate that we have met our appli- 
cation leadership price and performance goals that we sel 
at the start of the program. We have also exceeded the 
performance metrics we committed to at the beginning of 
the project. For more information regarding our perfor- 
mance results, visit the web site: 

lui p:// w w w.spec. org/gpe/ope 

For long-term sustainnbility of our price and performance 
leadership, we have continued working closely with our 
ISVs to time our implementation in areas that improve 
application performance. In addition, new CPUs are 



planned that will allow our implementation to run faster 
without any effort on our part, and cost reductions are 
continuing in graphics hardware. 

The goal to develop an implementation that can support a 
wide range of CPU or graphics devices has already been 
demonstrated. We support three graphics devices that 
have different performance levels (all based on the same 
hardware architecture) and a pure software implementa- 
tion that supports simple frame buffer devices on UNIX 
and Windows NT systems. 
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Graphics Needs of Technical Applications 
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The increasing use of 3D modeling for highly complex mechanical designs has 
led to a demand for systems that can provide smooth interactivity with 3D 
models containing millions or even billions of polygons. 
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irectModel* is a toolkit for creating technical 3D graphics applications. 
Its primary objective is to provide ihe performance necessary for interactive 
rendering of large 3D geometry models containing millions of polygons. 
DirectModel is implemented on top of traditional 3D graplucs applications 
programming interfaces (APIs), such as Starbase or OpenGL - . It provides the 
application developer wilh high-level 3D model management and advanced 
geometry culling and simplification techniques. Figure 1 shows DirectModel's 
position within the architecture of a 3D graphics application. 

This article discusses the role of 3D modeling in design engineering today, the 
challenges of implementing 3D modeling in mechanical design automation 
( MDA) systems, and the 3D modeling capabilities of the DirectModel toolkit. 

Visualization in Technical Applications 

The Role of 3D Data 

3D graphics is a diverse field ihat is enjoying rapid progress on many fronts. 
Significant advances have been made recently in photorealistic rendering, 
animation quality, low-cost game platforms, and state-of-the-art immersive 

DitectModel was jointiy developed by Hewlett-Packard and Engineering Animation Incorporated ol Ames, Iowa. 



Figure 1 

Application architecture. 
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Figure 2 

A low-resolution image of a 3D model of an engine 
consisting of 150,000 polygons. 




virtual reality* applications. The Interne! is populated 
With 3D virtual worlds and software catalogs are lull of 
applications for creating them. An example of a 3D model 
is shown in Figure 2. 

What do these developments mean for the users of tech- 
nical applications ( the scientists and engineers who pio- 
neered the use of 3D graphics as a tool for solving com- 
plex problems)? In many ways this technical community 
is following the same trends as the developers and users 
of nontechnical applications such as 3D games and Inter- 
active virtual worlds. They are interested in rinding less 
expensive systems for doing I heir work, their image 
quality standards are rising, and I heir patience with poor 
interactive performance is wearing thin. 

However, there are other areas where the unique aspects 
of 3D data for technical applications create special require- 
ments. In many applications the images created from the 
3D data that are displayed to the user are the goal. For 
example, the player of a game or the pilot in a flight simu- 
lator cares a lot about the quality and interactivity of 

* Immersive virtual reality is a technology that "immerses" the viewer into a virtual reality 
scene with head-mounted displays that change what is viewed as the user's head rotates 
and with gloves that sense where the user's hand is positioned and apply farce feedback 



the images, but cares very little about the data used by the 
system to create those images. In contrast, many techni- 
cal users of 3D graphics consider their data to be the most 
important component. The goal is to create, analyze, or 
improve the data, and 3D rendering is a useful means to 
thai end 

This key distinction between data I hat is the goal itself 
and data thai is a means to an end leads to major differ- 
ences in the architectures and techniques for working with 
those dala sets. 

3D Model Complexity 

Understanding the very central role thai data holds for 
the technical 3D graphics user immediately leads In the 
questions of what is that data and what are the significant 
trends over time? The short answer is that the size of the 
data is big and the amount and complexity of that dala is 
increasing rapidly. For example, a mechanical engineer 
doing stress analysis may now be tackling problems 
modeled with millions of polygons instead of the thou- 
sands dial sufficed a few years ago. 

The bends in the mechanical design automation (MDA) 
industry are good examples of the factors causing this 
growth. In the not-too-distant past mechanical design was 
accomplished using paper and pencil to create pan draw- 
ings, which were passed on to the model shop to create 
prototype parts, and then they were assembled into proto- 
type products for testing. The first slep in computerizing 
this process was the advent of 2D mechanical drafting 
applications that allowed the mechanical engineers to 
replace their drafting boards with computers. However, 
the task was still to produce a paper drawing to send to 
the model shop. The next step w as to replace these 2D 
drafting applications with 3D solid modelers that could 
model the complete 3D geometry of a part and support 
tasks such as sial ic and dynamic design analysis to find 
such things as the stress points when the parts move. This 
move to 3D solid modeling has had a big impact at many 
companies as a new technique for designing parts. How- 
ever, in many cases it has not resulted in a fundamental 
change to the process for designing and manufacturing 
whole products. 

Advances. In the last few years advances in the mechan- 
ical design automation industry have increasingly 
addressed virtual prototyping and oilier whole-producl 
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Fahrenheit 



Hewlett-Packard, Microsoft, and Silicon Graphics are collabo- 
rating on a project, code-named "Fahrenheit," that will define 
the future of graphics technologies. Based on the creation of a 
suite of APIs for DirectX on the Windows® and UNIX* operat- 
ing systems, the Fahrenheit project will lead to a common, 
extensible architecture for capitalizing on the rapidly expand- 
ing marketplace for graphics. 



Fahrenheit will incorporate the Microsoft Direct3D and Direct- 
Draw APIs with complementary technologies from HP and 
Silicon Graphics. HP is contributing DirectModel to this effort 
and is working with Microsoft and Silicon Graphics to define 
the best integration of the individual technologies. 



design issues. This desire to create new tools and 
processes that allow a design team to design, assemble, 
operate, and analyze an entire product in the computer Ls 
particularly strong at companies that manufacture large 
and complex products such as airplanes, automobiles, 
and large industrial plants. The leading-edge companies 
pioneering these changes are finding that computer-based 
virtual prototypes are much cheaper to create and easier 
tu modify than traditional physical prototypes. In addition 
they support an unprecedented level of interaction among 
multiple design teams, component suppliers, and end users 
that are located at widely dispersed sites. 

This move to computerized whole-product design is in 
turn leading to many new uses of the data. If the design 
engineers can interact online with their entire product, 
t hen each department involved in product development 
will want to be involved. For example, the marketing 
department wants to look at the evolving design while 
planning their marketing campaign, the manufacturing 
department wants to use the data to ensure the product's 
manufact inability, and the sales force wants to start 
showing it to customers to gel their feedback. 

These tasks all drive an increased demand for realistic 
models that are complete, detailed, and accurate. For 
example, mechanical engineers are demanding new levels 
of realism and interactivity to support tasks such as posi- 
tioning the fasteners thai hold piping and detecting inter- 
ferences created when a redesigned part bumps into one 
of the fasteners. This is a standard of realism that is very 
different from the photorealistic rendering requirements 
ol'ntlii'i applications and lo the technical user, a higher 

priority. 



Larger Models. These trends of more people using better 
tools to create more complete and complex data sets 
combine to produce very large 3D models. To under- 
stand this complexity, imagine a complete 3D model of 
everything you see imder the hood of your car. A single 
part could require at leasl a thousand polygons for a de- 
tailed representation, and a product such as an automo- 
bile is assembled from thousands of pacts. Even a small 
product such as an IIP DeskJet printer that sits on the 
comer of a desk requires in excess of 300.000 triangles' 
for a detailed model. A car door with its smooth curves, 
collection of controls, electric motors, and wiring har- 
ness can require one million polygons for a detailed 
model— Ihe car's power train can consist of 30 million 
polygons. 2 

These numbers are large, but they pale in comparison to 
the size of nonconsumer items. A Boeing 777 airplane 
contains approximalely 132,500 unique parts and over 
3,000,000 fasteners, ' yielding a 3D model containing more 
than 500.000,000 polygons. 1 A study that examined ihe 
complexity of naval platforms determined that a sub- 
marine is approximately ten times more complex than 
an airplane, and an aircraft carrier is approximately ten 
times more complex than a submarine. 1 3D models con- 
taining hundreds of millions or billions of polygons are 
real today. 

As big as these numbers are, the problem does not stop 
there. Designer's, manufacturers, and users of these com- 
plex products not only wanl to model and visualize the 
entire product, but they also want to do it in the context 
of the manufacturing process and in the context in which 
it is used. If the ship and the dry dock can be realistically 
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modeled and combined, it will be far less expensive to 
find and correct problems before they are built. 

Current System Limitations 

If the task faced by technical users is to interact with very 
huge ."51) models, how are the currently available systems 
doing? In a word, badly. Clearly the graphics pipeline 
alone is not going to solve the problem even with hard- 
ware acceleration. Assuming that rendering performance 
for reasonable interactivity must be at least 10 frames per 
second, a pipeline capable of rendering 1,000,000 poly- 
gons per second has no hope of interactively rendering 
any model larger than 100,000 polygons per frame. Even 
the HP VISUALIZE fx'"', the worlds fastest desktop graph- 
ics system, which is capable of rendering 4.6 million 
triangles per second, can barely provide 10 frames per 
second interactivity for a complete IIP DeskJet printer 
model. 

This is a sobering reality faced by many mechanical 
designers and other technical users today. Their systems 
work well for dealing with individual components but 
come up short when facing the complete problem. 

Approaches to Solving the Problem 

There are several approaches to solve the problem of ren- 
dering very complex 3D models with interactive perfor- 
mance. One approach is to increase the performance 
of the graphics hardware. Hewlett-Packard and other 
graphics hardware vendors are investing a lot of effort 
in this approach. However, increasing hardware perfor- 
mance alone is not sufficient because the complexity 
of many customers" problems is increasing faster than 
gains in hardware performance. A second approach 
that must also be explored involves using software algo- 
rithms to reduce the complexity of the 3D models tliat 
are rendered. 

Complex Data Sets 

To understand die general data complexity problem, we 
must examine it from t he perspective of the application 
developer. If a developer is creating a game, then it is 
perfectly valid to search for ways to create the imagery 
while minimizing the amount of data behind it. This ap- 
proach is served well by techniques such as extensive 



use of texture maps on a relatively small amount of ge- 
ometry. However, for an application responsible for pro- 
ducing or analyzing technical data, it is rar ely effective to 
improve the rendering performance by manually altering 
and reducing the data set. If the data set is huge, the ap- 
plication must be able to make the best of it during 3D 
rendering. Unfortunately, the problem of exponential 
growth in data complexity cannot be solved through 
incremental improvements to the performance of the 
current 31) graphic s arc hitectures — new approaches are 
required. 

Pixels per Polygon 

Although the problem of interactively rendering large 3D 
models on a typical engineering workstation is challenging, 
it is not intractable. If the workstation's graphics pipeline 
is capable of rendering a sustained 200.000 polygons per 
second (a conservative estimate), then each frame must 
be limited to 20.000 polygons to maintain 10 frames per 
second. A typical workstation with a 1280 by 1024 moni- 
tor provides 1,310,720 pixels. To cover this screen com- 
pletely with 20,000 polygons, each polygon must have an 
average area of 66 pixels. A more realistic estimate is that 
the rendered image covers some subset of the screen, say 
75 percent, and that several polygons, for example four, 
Overlap on each pixel, which implies each polygon must 
cover an area of approximately 200 pixels. 

On a typical workstation monitor with a screen resolution 
of approximately 100 pixels per inch, these polygons are a 
bit more than 0.1-inch on a side. Polygons of this size will 
create a high enough quality image for most engineering 
tasks. This image quality is even more compelling when 
you consider that it is the resolution produced during 
interactive navigation. A much higher-quality image can 
be rendered w ithin a few seconds when the user stops 
interacting with the model. Thus, todays 3D graphics 
workstations have enough rendering power to produce 
the fast, high-quality images requited by the technical 
user. 

Software Algorithms 

The challenge of interactive large model rendering is sort- 
ing through the millions of polygons hi the model and 
choosing (or creating) the best subset of those polygons 
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Figure 3 
Geometry culling 
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thai can be rendered in the time allowed for the frame. 
Algorithms that perform this geometry reduction fall into 
two broad categories: culling, which eliminates unneces- 
sary geometry, and simplification, which replaces some 
set of geometry with a simpler version. 

Figure 3 illustrates two types of culling: view frustum 
culling (eliminating geometry that is outside of the user's 
field of view) and occlusion culling (eliminating geometry 
that is hidden behind some other geometry). The article 
on page !• describes the implementation of occlusion cul- 
ling in the VLSI "ALIZK fx graphics accelerator. 

Figures 4 and 5 show two types of simplification. Figure 
4 shows a form of geometry simplification called tessella- 
tion, which takes a mathematical specification of a smooth 
surface and creates a polygonal representation at the spe- 
cified level of resolution. 



The decimation simplification technique is shown in 
Figure 5. Tliis technique reduces the number of polygons 
in a model by combining adjacent faces and edges. 

The simplified geometry created by these algorithms is 
used by tiie level of detail selection algorithms, which 
choose the appropriate representation to render for each 
frame based on criteria such as die distance to the object. 

Most 3D graphics pipelines render a model by rendering 
each primitive such as a polygon, line, or point indhidu- 
ally. If the model contains a million polygons, then the 
polygon-rendering algorithm is executed a million times. 
In contrast, these geometry' reduction algorithms must 
operate on the entire 3D model at once, or a significant 
portion of it. to achieve adequate gains. View frustum 
culling is a good example — the conventional 3D graphics 
pipeline will perform this operation on each individual 
polygon as it is rendered. However, to provide any signifi- 
cant benefit to the large model rendering problem, the 
culling algorithm mast be applied globally to a large chunk 
of t he model so that a significant amount of geometry can 
be eliminated with a single operation. Similarly, the geo- 
metry simplification algorithms can provide greatest gains 
when they are applied to a large portion of the model. 

Desired Solution 

The performance gap (often several orders of magnitude) 
bet ween the needs of the technical user and the capabili- 
ties of a typical system puts developers of technical appli- 
cations into an unfortunate bind; lievelopers are often 
experts in some technical domain that is the focus of their 
applications, perhaps stress analysis or piping layout. 
However, the 3D data sets that the applications manage 
are exceeding the graphics performance of the systems 



Figure 4 




Geometry tessellation. 




o m @ 


Smooth Fine 
Curve Tessellation 


Coarse 
Tessellation 



Figure 5 




Geometry decimation. 




A/y 

ZvV 


/ V 

L — * v 


Full Detail Geometry 


Decimated Geometry 




May 1998 • The HewletlPackard Journal 



© Copr. 1949-1998 Hewlett-Packard Co. 



Figure ii 

Extended graphics pipeline. 



Model-Based Operations 




Primitive-Based Operations 




Simplilicalion 






J Translormalion 


1 lighting and 


> 




Shading 










they run on. Developers are faced with the choice of ob- 
taining the 3D graphics expertise to create a sophisticated 
rendering architecture for their applications, or seeing 
their applications lag far behind their customers' needs 
for large 3D modeling capacity and interactivity. 

To develop applications with the performance demanded 
by their customers, developers need access to graphics 
systems that provide dramatic performance gains for their 
tasks and data. As shown in Figure 6. the graphics pipe- 
line available to the applications must be extended to 
include model-based optimizations, such as culling and 
simplification, so that it can support interactive rendering 
of very large 3D models. When the graphics system pro- 
vides this level of performance, application dev elopers 
are free to focus on improving the functionality of their 
applications without concern about graphics perfor- 
mance. The article on page 9 describes the primitive- 
based operations of the pipeline shown in Figure (i. 

DirectModel Capabilities 

DirectModel is a toolkit for creating technical 3D graphics 
applications. The engineer or scientist who must create, 
visualize, and analyze massive amounts of 3D data does 
not interact directly with DirectModel. DirectModel pro- 
vides high-level 3D model management of large 3D geo- 
metry models containing millions of polygons, It uses 
advanced geometry simplification and culling algorithms 
to support interactive rendering. Figure 1 shows that 
DirectModel is implemented on top of traditional 3D 
graphics APIs such as Starba.se orOpenGL. It extends, 
but does not replace, the current software and hardware 
3D rendering pipeline. 

Key aspects of the DirectModel toolkit include: 

■ A Focus on the needs of technical applications that deal 
with large volumes of 3D geometry data 



■ Capability for cross-platform support of a wide variety 
of technical syst ems 

■ Extensive support of MDA applications (for example, 
translators for common MDA data types). 

Technical Data 

As discussed above, the underlying data is often the most 
important item to the user of a technical application. For 
example, when designers select parts on the screen and 
ask for dimensions, they want to know the precise engi- 
neering dimension, not some inexact dimension that re- 
sults when the data is passed through the graphics system 
for rendering. DirectModel provides the interfaces that 
allow the application to specify and query data with this 
h '\ el i if technical precision. 

Technical data often contains far more than graphical in- 
formation. In fact, the metadata such as who created the 
model, what it is related to. and the results of analyzing it 
is often much larger than the graphical data dtat is ren- 
dered. Consequently DirectModel provides the interfaces 
that allow an application to create the links between the 
graphical data and the vast amount of related metadata. 

Components of large models are often created, owned, 
and managed by people or organizations that are loosely 
connected. For example, one design group might be 
responsible for the fuselage of an airplane while a sepa- 
rate group is responsible for the design of the engines. 
DirectModel supports this mulliteam collaboration 
by allowing a 3D model to be assembled from several 
smaller 3D models that have been independently defined 
and optimized. 

Multiple Representations of the Model 

The 3D model is the central concept of Direct Model — the 
application defines the model and DirectModel is respon- 
sible for high-performance optimization and rendering of 
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Figure 7 

Logical and spatial organization. 
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it The 3D model is defined hierarchically by the model 
graph, which consists of a set of nodes linked together 
into a directed, acyclic graph. However, a common prob- 
lem thai OCCtUS when dealing a model graph is the con- 
flict between I he needs of the application needs and lite 
graphics system. The application lypically needs to orga- 
nize the model based on Ihe logical relationships be- 
tween the components, whereas the graphics system 
needs to organize the model based on the spatial rela- 
tionships so thai il can be efficiently simplified, culled, 
and rendered. Figure 7 shows two model graphs for a car. 
one organized logically and one spatially. 

Graphic s toolkits that use a single model graph for both 
the application's interaction with the model and for ren- 
dering Ihe model force the application developer lo opti- 
mize for one use while making the other use difficult. In 
contrast. Direct Model maintains multiple organizations of 
the model so thai it can simultaneously be optimized for 
several different uses. The application is free lo organize 
its model graph based on its functional requirements 
withe. ill consideration of Direct Model's rendering needs. 
Direct Model will create and maintain an additional spatial 
Organization thai is Optimized for rendering. These mult iple 
organizations do not significantly increase the memory or 



Spatial Relationships 




disk usage of Direct Model because the actual geometry, 
by far I he largest component, is multiply referenced, not 

duplicated. 

The Problem of Motion 

( Ihject motion, both predefined and interactive, is critical 
to many technical applications. In mechanical design, for 
example, users want to see suspension systems moving, 
engines rocking, and pistons and valves in motion. To use 
a virtual prototype for manufacturing planning, motion is 
mandatory. Assembly sequences can be verified only by 
observing ihe motion of each component as it moves into 
place along its presc ribed path. Users also want to grab 
an object or subassembly and move il through space, 
while bumping and jostling the object as it interferes with 
other objects in its path. In short, motion is an essential 
component for creating the level of realism necessary for 
full use of digital prototypes. 

I lirect Model supports this demand for adding lion to 

3D models in several ways. Because DirectModel does not 
force an application to create a model graph that is opti- 
mized for fast rendering, il can instead create one that is 
optimized for managing motion. Parts that are physically 
connected in real life can be connected in the model graph. 
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allowing movement to cascade easily through all of the 
affected parts. In addition, the data structures and algo- 
rithms used by DirectModel to optimize the model graph 
for rendering are designed for easy incremental update 
when some portion of the applications model graph 
changes. 

Models as Databases 

3D models containing millions of polygons with a rich sel 
of rendering attributes and metadata can easily require 
several gigabytes of data Models of this size are fre- 
quently too big to be completely held in main memory, 
wliich makes it particularly challenging to support 
smooth interactivity. 

DirectModel solves this problem by treating die model as a 
database that is held on disk and incrementally brought in 
and out of main memory as necessary. Elements of the 
model, including individual level-of-detail representations, 
must come from disk as they are needed and removed 
from main memory when they are not needed. In this way 
memory can be reserved for the geometric representa- 
tions currently of interest. DireclModel's large model 
capability has as much to do with rapid and intelligent 
database interaction as with rendering optimization. 

Interactive versus Batch-Mode Data Preparation 

Applications that deal with large 3D models have a wide 
range of capabilities. One application may be simply an 
interactive viewer of large models that are assembled from 
existing data. Another application may be a 3D editor (for 
example, a solid modeler) that supports designing me- 
chanical parts within the context of their full assembly. 
Consequently, an application may acquire and optimize a 
large amount of 3D geometry all at once, or the parts of 
the model may be created little by little. 

DirectModel supports bolh of these scenarios by allowing 
model creation and optimization to occur either interac- 
tively or in batch mode. If an application has a great deal 
of raw geometry' that must be rendered, it will typically 
choose to provide a batch-mode preprocessor that builds 
the model graph, invokes the sorting and simplification 
algorithms, and then saves the results. An interactive appli- 
cation can then load the optimized data and immediately 
allow the user to navigate through the data. However, if 
the application is creating or modifying the elements of 
t he model at a slow rate, then it is reasonable to sort and 
optimize the data in real time. Hybrid scenarios are also 



possible where an interactive application performs incre- 
mental optimization of the model with any spare CPU 
cycles that are available. 

The important thing to note in these scenarios is that 
DirectModel does not make a strong distinction between 
batch and interactive operations. All operations can be 
considered interactive and the application developer is 
free to employ (hem in a batch manner when appropriate. 

Extensibility 

Large 3D models used by technical applications have 
different characteristics. Some models are highly regular 
with geometry laid out on a fixed grid (for example, 
rectangular buildings with rectangular rooms) whereas 
Others are highly irregular (for example, an automobile 
engine with curved parts located at many different 
orientations). Some models have a high degree of occlu- 
sion where entire parts or assemblies are hidden from 
many viewing perspectives. Other models have more 
holes through them allowing glimpses of otherwise hid- 
den parts- Some models are spatially dense with many 
components packed into a tighl space, whereas others 
are sparse with sizable gaps between the parts. 

These vast differences impact the choice of effective opti- 
mization and rendering algorithms. For example, highly 
regular models such as buildings are amenable to prepro- 
cessing to determine regions of visibility ( for example, 
rooms A through E are not visible from any point in room 
2). However, this type of preprocessing is not very effec- 
tive when applied to irregular models such as an engine. 
In addition, large model visualization is a vibrant field of 
research with innovative new algorithms appearing regu- 
larly. The algorithms that seem optimal today may appeal" 
very limiting tomorrow. 

DirectModel's flexible architecture allows application 
developers to choose the right combination of techniques, 
including creating new algorithms to extend the system's 
capabilities. All of the DirectModel functions, such as its 
culling algorithms, representation generators, tessella- 
tors. and picking operators, are extensible in this way. 
Extensions fit seamlessly into the algorithms they ex- 
tend, indistinguishable from the default capabilities in- 
herent to the toolkit. 

In addition. DirectModel supports mixed-mode rendering 
in which an application uses DirectModel for some of its 
rendering needs and calls the underlying core graphics 
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API directly for other rendering operations. Although Di- 
rectModel can fulfill the complete graphics needs of many 
applications, it does not require that it be used exclusively. 

Multiplat form Support 

A variety of systems are commonly used for today's tech- 
nical 3D graphics applications, ranging from high-end 
personal computers through various l~NIX-based work- 
stations and supercomputers. In addition, several 3D 
graphics APIs and architectures are either established or 
emerging as appropriate foundations for technical applica- 
tions. Most developers of technical applications Support a 
variety of existing systems and must be able to migrate 
their applications onto new hardware arehiteemres as the 
market evolves. 

DirectModel has been carefully designed and implemented 
for optimum rendering performance on multiple platforms 
and operating systems. It presumes no particular graphics 
API and is designed to select at run time the graphics API 
best suited to the platform or specified by the application. 
In addition, its core rendering algorithms dynamically 
adapt themselves to the performance requirements of the 
underlying graphics pipeline. 



Conclusion 



The increasing use of 3D graphic s as a powerful tool (or 
solving technical problems has led to an explosion in the 
complexity of problems being addressed, resulting in 3D 
models containing millions or even billions of polygons. 



1'nJbrtunately, many of the applications and 3D graphics 
systems in use today are built on architectures designed 
to handle only a few thousands polygons effic iently 
These arc hitec tures are incapable Of providing inter- 
activity with today's large technical data sets. 

This problem has created a strong demand for new graph- 
ics architectures and products that are designed for inter- 
active rendering of large models on affordable systems. 
Hewlett-Packard is meeting this demand with Direct- 
Model, a cross-platform toolkit that enables interaction 
with large, complex. 3D models. 
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Three graphics accelerator products with different levels of performance are 
based on varying combinations of five custom integrated circuits. In addition, 
these products are the first ones from Hewlett-Packard to provide native 
acceleration for the OpenGL' 3 ' API. 



T 



.he VISUALIZE fx family of graphics subsystems consists of three 
products, fx'', fx 1 , and fx-, and an optional hardware texture mapping module. 
These products are built around a common architecture using the same 
custom integrated circuits. The primary difference between these controllers 
is the number of custom chips used in each product (sec Table I). 



Table 1 








Number of custom chips in the different 
VISUALIZE fx products 




Product 


Texture 
Chip 


Geometry 
Chip 


Raster 
Chip 


fx 2 
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fx 4 
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fx 8 
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4 



A chip-level block diagra f the MSI ALIZE fx'' product is shown in Figure I 

This is the most complex configuration and also the one with the highest 
performance in die product line. The VISUALIZE fx 4 and the VISUALIZE fx 2 
products use subsets of the chips used in the fx 1 '. The fx 1 ' and fx 1 subsystems 
have support for the optional hardware-accelerated texture map module, 
which contains a local texture c ache for storage of texture map images. If the 
texture accelerator is not present, the bus between the interface chip and the 
first raster chip is directly connected. 
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Figure 1 

A chip-level diagram of the VISUALIZE fx 6 product. 
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Geometry Chip 

• 3D Geometry and Lighting Acceleration 

Texture Chip 

• Texture Rasterization 

• Texture Map Cache Controller 

• Texture Memory Control 

• Texture Interpolation 



Interface Chip 

• I/O Buffering 

• 3D Geometry Workload Distribution 
and Concentration 

• 2D and 3D Data Path Arbitration 

• 2D Acceleration 

• YUV to RGB Conversion Support 

• Pixel Level Pan and Zoom 

• Pixel Level Image Rotations 



Raster Chip 

• Fragment Processing 

• Frame Buffer Control Functions 

Video Chip 

• Color Lookup Tables 

• Video Timing 

• Digital-to-Analog Conversion 

• Video-Out Data 



Interface Chip 

The interface chip provides a PCI 2. 1 (also referred to as 
I'd 2X) compliant interface' It operates al up lo (ill MM/, 
in 64-bit mode. Special efforts have been made in the 

PCI = Peripheral Component Interconnect 



design of the buffering and (lie interface lo the PCI. As a 
result, the driver is able to sustain writes of 3D geometry 
commands to the PCI at almost the theoretical maximum 
tales thai could In- computed for the ft I. The article on 
page 51 discusses PCI capability. 
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Occlusion Culling 



The HP fast-break program (page 8) enabled us to understand 
customer requirements by analyzing what is important in 
OpenGL graphics today. As a result, we developed a technol- 
ogy called occlusion culling as an extension to OpenGL and 
implemented it in the VISUALIZE fx graphics hardware. 

We found that the data sets many graphics workstation cus- 
tomers are trying to visualize are very complex These data 
sets have large numbers of small, complex components that 
are not always visible in the final images. For instance, when 
rendering an airplane, all of the MCAD parts are present in the 
data set represented by potentially millions of polygons that 
must be processed. However, when this airplane is viewed 
from the outside only the outer surfaces are visible, not the fan 
blades of the engine or the seats or bulkheads in the interior. 

In a traditional 3D z-buffered graphics system, all polygons in 
a scene must be processed by the graphics pipeline because it 
is not known a priori which polygons will be visible and which 
ones will be occluded (not visible) The notion of occlusion 
culling, or removal of occluded objects, has been talked about 
in the research community for several years. However, imple- 
mentations tend to be in software where the performance is 
not at a satisfactory level 

In the VISUALIZE fx series of graphics devices, HP developed 
a very efficient algorithm that tests objects for visibility. 
An application program can very quickly use the occlusion 
culling visibility test to determine if a simple bounding box 



representation of a more complex part is visible. Since a 
bounding box, or more generally a bounding volume, com- 
pletely encloses the more complex part, it is possible to know 
a priori that if the bounding volume is not visible then the 
complex part it encloses is not visible. Thus, the part that is 
not visible does not need to be processed through the graphics 
pipeline. The real benefit of occlusion culling comes when a 
very complex part consisting of many vertices can be rejected, 
avoiding the expenditure of valuable time to process it. 

For very complex data sets, such as the airplane mentioned 
above or an automobile, a tremendous performance increase 
can be realized by using the HP occlusion culling technology. 
To date, several ISVs have begun using occlusion culling in 
their applications and are seeing a 25 to 1 00 percent increase 
in graphics performance. This magnitude of performance bene- 
fit typically costs a customer several thousand dollars for the 
extra computational horsepower. HP includes this technology 
as standard in all VISUALIZE fx series graphics accelerators, 
giving even better price and performance results to our 
customers. 

The future of 3D graphics will continue toward visualizing ever 
more complex objects and environments. Occlusion culling 
together with HP's DirectModel technology (page 19) are 
well positioned to be industry leaders in providing the technol- 
ogy for 3D modeling applications. 



The primary responsibility of the interface chip is to sepa- 
rate I he streams of data thai arrive from the host SPl' into 
three paths and arbitrate access among those patlis. 

3D Path. Typically data from the host CPU looks very 
much like the OpenGL API functions themselves. Data 
following this fust path is routed to the geometry chips. 
The geometry chips process the data and return the re- 
sults to the interface chip. These results are then sent on 
to the texture chips or directly to the raster chips if the 
texture mapping subsystem is not installed. In either case 
the data is transmitted to and through all the texture and 
raster chips in the system. 

Unbuffered Path. This path passes dala directly through 
the interface chip to the texture and raster chips. This 
provides a bypass method that allows traffic to get around 



other pending operations. An example would be a texture 
cache download that is required to complete a primitive 
that is currently being rasterized. a situation that would 
lead to deadlock without the unbuffered path. 

2D Path. This path runs directly through the interface chip 
to the texture and raster chips. The 2D path differs from 
the unbuffered path in the way its priority is handled. The 
interface chip manages priority among the three paths as 
they all converge on the same set of wires between the 
interface chip and the first texture chip. The unbuffered 
path goes directly through the interface chip to those 
wires and has priority over the other two paths. Data 
targeting the 2D path is held off until all preceding 3D 
work in the geometry chip has been flushed through to 
the Oral texture chip. 



Ma, 1998 • The Hewlett-Packard Journal 




© Copr. 1949-1998 Hewlett-Packard Co. 



There is also special circuitry in the interface chip that is 
used to accelerate many operations commonly done by 
XI 1 or other 2D APIs- 
Buses 

The three primary buses in the system are each run at 
200 MHz, allowing sustainable transfer rates of more 
than 800 Mbytes per second. To control the loading on 
the interconnections for these buses, they are built as 
point-to-point coiuiections from one chip to the next. 

Each chip receives the signals and then retransmits them 
to the next chip in the sequence. This requires more pins 
on eac h part, but limits the number of loads on each wire 
to a single receiver as well as limiting the wiring length 
that signals must traverse. This allows for reliable com- 
munications despite the high frequency of the buses. 

The fust of these three buses distributes work to the 
geomet ry chips. This bus starts at the interface chip 
and runs through all the geometry chips in the system. 
Each geometry chip monitors the data stream as it flows 
through the bus and picks off work to operate upon based 
On an algorithm that selects the least busy geometry chip. 

The second of t hese buses starts at t he last geometry chip 
and passes through the others back to the interface chip. 
The results of the work done by the geometry chips is 
placed on this bus in ihe same sequence as it was moved 
along Ihe first bus. This si rid ordering control prevents 
certain artifacts from showing up in I he final image. 

The third bus ties the interface chip to the texture and 
frame buffer subsystems. It is wired in a loop that goes 
back to the interface chip from Ihe lasl chip in Ihe chain. 
3D operalions typically flow from the interface chip to 
Ihe chips along this bus, and when they eventually get 
back to the end of the loop, they are thrown away. 

For 2D operations, such its moving blocks of pixels 
around the frame buffer, the operation of the third bus is 
somewhal different. The movement of pixel data operates 
as a sequence of reads followed by a sequence or writes. 
The reads cause dala to be dumped from Ihe frame buffer 
locations onto Ihe bus and ihe results Havel back to the 
interface chip. This data is then associated with new 
addresses and sent as writes back down Ihe bus. ending 
up back at the frame buffer but in different locations. 

Besides Ihe three primary buses mentioned above, 
there are three secondary buses in Ihe system. The first 



bus connects the interface chip to the video chip. This 
provides video control, download of color maps, and 
cursor control. The second bus is a connection from each 
raster c hip to the video chip. This path is used to provide 
video refresh data to display frame buffer contents. The 
final secondary bus is a connection from each texture 
chip to two of the raster chips. Tltis path allows the flow 
of filtered texture data into the raster cltips for combina- 
tion with nontexture fragment data. 

Geometry Chip 

The geometry and lighting chips are responsible for taking 
in geometric- primitives (points, lines, triangles, and quad- 
rilaterals) and executing all the operations associated 
with the transform stage of the graphics pipeline ( see the 
article on page 9 for more about the graphics pipeline). 
These operations include: 

■ Transformation of the coordinates from model space to 
eye space 

■ Computing a vertex color based on the lighting stale. 
Which consists of up to eight directional or positional 
light sources 

■ Texture map calc ulations that include: 

□ Environment map calculations for texture mapping 

□ Texture coordinate transformation 

□ Linear texture coordinate generation 
Texture projection 

■ View volume clipping and clipping against six arbitrary 
applic ation-specified planes to determine whether a 
primitive is c ompletely visible, rejected because it is 
completely outside the view area, or needs to be 
reduced into its visible components 

■ Perspective projection transformation to cause 
primitives In look smaller ihe further away from 
the eye they are 

■ Setup calculations for rasterization in the raster c hip. 

There were some Interesting problems to solve in the 
design of Ihe distribution and coalescing of work up and 
down the geometry chip daisy chain. For example, load 
balancing, maintaining strict order in the output stream, 
and ensuring that operalions. such as binding of colors 
and normals to vertices, perform as required by OpenGL. 



o 
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Fast Virtual Texturing 



Texture mapping, which is wrapping a picture over a three 
dimensional object, has been used over the years as a key 
feature to enhance photorealism, reduce data set sizes, per- 
form visual analysis, and aid in simulations (see Figure 1|. 
Since texturing calculations are computationally expensive 
and memory access for large textures can be prohibitively 
slow, various workstation graphics vendors have provided 
hardware-accelerated texture mapping as a key differentiator 
for their product. 

A primary drawback of these attempts at hardware accelera- 
tion is that dedicated local hardware texture memory is limited 



Figure 1 

A 3D textured skull. The VISUALIZE fx 4 and fx 6 subsystems 
supports texture map acceleration option. Pictured here 
is the use of 3D texture mapping OpenGL extensions with 
this option. This feature allows visualization of 3D data 
sets such as MRI images. 




in size and is expensive. To take advantage of the perfor- 
mance boost, graphics applications were constrained to tex- 
tures that fit in the local hardware texture memory. In other 
words, the application was responsible for managing this 
hardware resource. 

Noticing this obvious artificial application limitation in texturing 
functionality, performance, and portability, Hewlett-Packard 
introduced, in the VISUALIZE-48, a new concept in hardware 
texture mapping called virtual texture mapping. Virtual texture 
mapping uses the dedicated local hardware texture memory 
as a true texture cache, swapping in and out of the cache the 
portions of textures that are needed for rendering a 3D image. 
Thus, for texturing applications, these limitations were elimi- 
nated. The application could define and use a texture map of 
any size (up to a theoretical limit of 32K texels x 32K texels") 
that would be hardware accelerated, eliminating the need for 
the application to be responsible for managing local texture 
memory. 

Using the local hardware texture memory as a cache also 
means that this memory uses only the portions of the texture 
maps needed to render the image. This efficiency translates 
to more and larger texture maps being hardware accelerated 
at the same time. Applications that previously could not run 
because of texture size limits can now run because of the 
unlimited virtual texture size. Also, with only the used por- 
tions of the texture map being downloaded to the cache, far 
less graphics bus traffic occurs. 

The system design of virtual texture mapping involved changes 
in the HP-UX operating system to support graphics interrupts, 
onboard firmware support for these interrupts, the introduction 
of an asynchronous texture interrupt managing daemon pro- 
cess, and the associated texturing hardware described in this 

*A lexel is one element d! a texture 



The output of l he geometry chip's daisy chain is passed 
back through the interface chip. Generally, for triangle 
based primitives, the output takes the form of plane equa- 
tions. As these floating-point plane equations are returned 
from the geometry chip to the interface chip and passed 
on to the texture chips, certain addressed locations in the 
interface chip will result in the floating-point values being 



converted to fixed-point values as they pass through. 
These fixed-point values are in a form the raster chips 
need to rasterize the primitive. 

The daisy-chain design allows up to eight of the geometry 
chips to be used although only three are applied in the 
case of the VISUALIZE fx li product at this time. 
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article. Having a centralized daemon process manage the 
cache allows for cache efficiency, parallel handling of texture 
downloads while 3D graphics rendering is occurring, and shar- 
ing textures among graphics contexts. 

The VISUALIZE fx J and VISUALIZE fx 6 texture mapping 
options incorporate the second generation advances in virtual 
texture mapping. Full OpenGL 1 .1 texture map hardware sup- 
port has brought about dramatic improvements in texture 
map download performance and switching between texture 
maps and new extended features such as 3D texture mapping, 
shadows (Figure 2), and proper specular lighting on textures 



Figure 2 

A shadow texture image. 




I Figure 3) These features have made these products very 
appealing systems for texturing applications on workstation 
graphics. 

The texture mapping performance on these systems is very 
competitive The VISUALIZE fx 5 texture fill rate is about twice 
that of the VISUALIZE fx 4 texture option. However, fill rates 
alone do not describe how these systems perform in a true 
application environment. Aggressive texture mapping applica- 
tion performance comparisons show two to three times per- 
formance superiority over similarly priced graphics workstation 
products. 



Figure 3 

A specular lit texture image. Correct specular lighting of 
textured images can be achieved with VISUALIZE fx* and 
fx 6 texture mapping options. 





Texture Chip 

The texture chip is responsible for accelerating texture 
mapping operations. Towards this end, ii performs three 
basic functions: 

■ Maintains a cache of texture map dala. requesting cache 
updates for texture values required by current rendering 
opfiations as needed (see "Fast Virtual Texturing" on 
page 32) 



■ Generates perspect ive corrected texture coordinates 
from plane equations representing triangles, points, or 
lines 

■ Fetches and Tillers the texture data as specified by the 
application based on whether the texture needs to he 
magnified or minimized to fit I he geometry it is being 
mapped to and passes the result on to the raster chips. 
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Raster Chip 

Tho raster chip rasterizes Lhe geometry into the frame 
buffer. This means il determines which pixels are Id be 
potentially modified and, if so. whether they should be 
modified based on various current state values (including 
the contents of the z buffer). The raster chip also controls 
access to lhe various buffers that make up the frame 
buffer. This includes the image buffer for Storing the image 
displayed on the screen (potentially two buffers if double 
buffering is in effect ). an overlay buffer that contains im- 
ages that overlay the image buffer, the depth or z buffer 
for hidden surface removal, the stencil buffer/ and an 
alpha buffer™ on the VISUALIZE fx 0 . To accomplish its 
work the raster chip performs four basic functions: 

■ Raslerize primitives described as points, lines, or 
t riangles 

■ Apply fragment operations as defined by OpenGL (such 
as blending and raster operations) 

■ Control of and access to buffer memory, including all 
the buffers described earlier 

■ Refresh the data stream for the video chip, including 
handling windows and overlays. 

Video Chip 

The Video ehip provides Video functions for controlling 
the data How from the frame buffer to the display and 

A stencil buffet is per pixel data thai can be updated when pixel data is written and used 
to restrict the modification of the pixel. 

An alpha buffer contains per pixel data that describes coverage information about lhe 
pixel and can be used when blending new pixel values with the current pixel value. 



mapping data from values to color. The features of the 
video chip include: 

■ Data mapping to colors: 

Two independent 40!)(>-by-24-bil lookup tables 

- Four independent 256-by-3-by-8-bit lookup tables 
for image planes 

c A bypass pat h for 24-bit true color data 

Two independent 256-by-8-bit lookup tables for 
overlay planes 

■ Digital-to-analog conversion 

■ Video liming 

■ Video output. 



Conclusion 



The VISUALIZE fx family of products currently has a sub- 
stantial lead in not only price/performance measurements, 
but it also leads hi performance independent of cost 

For information regarding how these systems compare 
against the competition, visit the SPEC (an industry stan- 
dard body of benchmarks ) web page at: 

http://www.spec.org/gpi." 
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HP Kayak: A PC Workstation with Advanced 
Graphics Performance 



World-leading 3D graphics performance, normally only found in a UNIX* 
workstation, is provided in a PC workstation platform running the Windows 
NT* operating system. This system was put together with a time to market of 
less than one year from project initiation to shipment. 



V^«/ oniputer graphic s workstations are powerful desktop computers used 
by a variety of technical professionals to perform their day-to-day work. 
Traditionally, such computers have run with a version of the UNIX operating 
system. In the past year, however, workstations featuring Intel processors such 
as the Pentium™ Pro and Pentium II and running the Microsoft'"' Windows NT 
Operating system have begun to gain ground in both capability and markei 
share. Hewlett-Packard has hisiorically been a leader in the UNIX workstation 
business, hi February, 1SI97, Hewlett-Packard began a project to put its high- 
performance workstation graphics into a PC workstation platform. 

Technical Challenges 

Fitting HP workstation graphics into a Windows NT platform was nol an easy 
task. The task was made more exciting with the addition of schedule pressure. 
The schedule gave us only four months to reach functional completion and 



only two months after that to finish the quality assurance process. This schedule 
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II was difficult al limes to distinguish software defects from hardware defects. 
This article describes how we overcame some of the challenges we encountered 



was made even more challenging because the hardware was nol ye! complete. 
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The hardware for the HP Kayak workstation (Figure 1 ) is based on the 
VLSI IAL1ZK fx 1 graphics subsystem lor real-lime :JI) modeling (see the* article 
on page 28). However, a couple of changes were necessary. First, lo achieve 



The Hardware 



while implementing Ibis project. 
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the performance available in the graphics hardware, the 
bus interface had to be changed from the standard Periph- 
eral Component Interconnect (PCI) to the accelerated 
graphics port (AGP),t since no commodity PC chipset 
supported PCI 2X. With normal industry-standard PCI, we 
would have been limited to 132 Mbytes/s for I/O, which 
would have hurt our performance on several important 
benchmarks. With the accelerated graphics port, the avail- 
able I/O bandwidth increased to 262 Mbytes/s. 

The second change necessary to the hardware was the 
addition of industry-standard VGA graphics. During the 

AGP is a bus that transfers data to and from a graphics accelerator. 



boot process of Windows NT, and at occasional intervals 
after thai, the computer will access VGA grapltics registers 
directly. To achieve this, a VGA daughtercarri was created 
that displays its graphics through the video feature connec- 
tor created for the UNIX video solution. The main graphics 
board was modified slightly, making it possible to dynami- 
cally switch between VGA graphics and VISUALIZE fx' 1 
graphics. Figure 2 shows a hardware block diagram for 
an IIP Kayak workstation. 

Windows NT Driver Architecture 

The fact that the hardware for the HP Kayak workstation 
is similar to the VISUALIZE fx 4 hardware, which runs the 
UNIX operating system, made tiie software effort much 
easier. However, many significant hurdles had to be over- 
come to gel the software running under Windows NT. 

The first challenge was the Windows NT device driver 
architecture (Figure 3). On HP-UX' , graphics device 
drivers have a large amount of kernel support, allowing 
them to access the graphics hardware directly from user- 
level code without having to execute any special locking 
routines. This direct hardware access (DHA) method is 
not present on Windows NT. Instead, all accesses to the 
hardware must be performed from the kernel (ring 0 in 
Figure 3). 



Figure 2 

A hardware block diagram for an HP Kayak workstation. 
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Fortunately, the VISUALIZE fx 4 architecture specifies a 
buffered form of communication in which grapliical com- 
mands are placed into command data packets in a large 
buffer in the hardware. It was a simple task to modify the 
HP-UX drivers to access a software allocated command 
data packet buffer instead. When one of these software 
buffers gets full, it is passed to the ring 0 driver that for- 
wards the buffer to the hardware. 

The lighter-shaded modules in Figure 3 represent the 
libraries that were delivered by HP lo support the VISU- 
ALIZE fx 4 hardware. The libraries in ring 3 (Hpicd.dll and 
Hpvisxdx.dll ) were fairly straightforward ports of the 
corresponding UNIX libraries libGL.sl and libddvisxgl.sl. 
The libraries in ring 0 (Hpvisxmp.sys. Hpvisxnt.dll. and 
Hpvisxkx.dll) had to be created from scratch to support the 



Figure 3 

The Windows NT device driver architecture. 
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Windows NT driver model. These modules make up about 
30 percent of the size of the ring 3 modules. 

Integration with 2D Window* NT Graphics 

The second challenge was to integrate the 3D OpenGL 
graphics support with the standard Windows N'T graphical 
device interface. Microsoft specifies two methods diat can 
be used to do this. The first, called a miniclienl driver, is 
a rasterization-level OpenGL driver that uses the Micro- 
soft OpenGL software pipeline for lighting and trans- 
formation. This driver would have been easy to create, 
but il would not hav e allowed us to take advantage of 
the hardware transformation and lighting provided by 
VISUALIZE fx 4 . 

The second method, called an installable client driver, is 
a geometry-level OpenGL driver that leaves implementa- 
tion of the lighting and transformation pipeline up to the 
driver writer. The driv er allows us full access to all 
OpenGL API routines. This is the route we chose be- 
cause we already had a full implementation of OpenGL. 
which we had created to run on the HP-UX operating 
system. This implementation was ported to the installable 
client driver model over a span of several weeks, while 
we added support for Windows NT multit hreading. The 
bulk of the VISUALIZE fx 4 graphical device interface 
driver was writ I en by a separate team Of experts without 
much consideration for 3D graphics acceleration. This 
enabled I hem to get the Windows NT display driver run- 
ning in a short amount of time and allowed them to con- 
tinue enhancing 2D performance without severely im- 
pacting the 3D device driver team. Some of the results of 
these efforts are shown in Figure 4. 

Integrating the Windows NT Driver with Ring 0 

A third challenge was to integrate the Windows NT driver 
with the ring 0 portion of the OpenGL driver while main- 
taining separate code bases for the different teams. We 
decided to make our ring 0 driver a separately 1< >a< tal lie 
library. This decision kept the source code separate. It 
enabled much faster edit-compile-dehug cycles, since it 
allowed us to replace a portion of the ring t) driver with- 
out having to reboot the computer. However, the separa- 
tion added extra complexity because we had two very 
different drivers accessing the same piece of hardware. 
To solve this problem, we created a variable called a 
hardieare access token . Each driver has a special token 
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Figure 4 

la) A 3D image in a 2D environment, lb) Several 3D programs in a 2D environment. 




that it places in the hardware access token to indicate 
that it was the last driver to access the hardware. When a 
driver detects that the token is not its own. it executes 
procedures known as context save and context restore. 
The context save reads all applicable hardware state in- 
formation from the device into software buffers. The con- 
text restore places the previously saved state back into 
the hardware. This same mechanism is used to mediate 
hardware accesses between different processes running 
OpenGL. 

Integration of VISUALIZE fx 4 Architecture 

A fourth challenge for the team was the integration of the 
VISUALIZE fx J stacked planes architecture (Figure 5a ) 



Figure 5 

la) VISUALIZE fx 4 stacked frame buffer model, lb) Windows 
NT offscreen frame buffer model 
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into the Windows NT environment. Workstations tradi- 
tionally have very deep pixels, each pixel having up to 
90 bits of information. This information includes support 
for such things as transparent overlays, double buffering, 
ltidden surface removal, and clipping. Windows NT expects 
a slightly different model, in which the extra per pixel 
information is allocated in offscreen storage when a 3D 
rendering context is created (Tigure 5b). What this means 
is that when the window state is changed (for example, 
when a window is moved on the desktop ). Windows N'T 
does not make any special calls to the device driver. This 
presented a problem, sinc e our stacked planes architec- 
ture needs to keep all of the extra information directly 
associated with the correct visible screen pixels. 

To fix this problem, we used a Windows mechanism 
called a iri inline object (Figure 6). The window object 
tracks a window state and executes callbacks into our 
driver when a window state is modified. This added an 
unfortunate amount of complexify into our driver, since 
the window state is asynchronous to all other hardware 
accesses and not all of the window state information we 
need was directly available to us. In addition, applications 
expect to be able to mix Windows NT graphical device 
interface rendering and 3D OpenGL rendering in the same 
window. These two problems required us to add a double 



Figure 6 

The components of a window object. 
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buffering mechanism thai actually copies the physical 
back buffer bits into the displayed front buffer. This is 
significantly slower than the native per pixel double buff- 
ering of VLSI AI.IZK fx 1 . However, it fits better into the 
Windows NT model and enables all applications to run. 
We still enable the native method for applications and 
benchmarks that work correctly with it, since it is signifi- 
cantly faster. 

Performance 

A fifth challenge for the team was performance. In the 
graphics workstation market, performance is usually the 
main differentiator. The most popular single measure of 
performance in the PC graphics market is the OPC View- 
perf benchmark known as CBRS-03. 1 By July, 1997. we 
had achieved a CDRS-03 rating of 74 — a performance 
level that exceeded all known competitors. This met our 
goals set at the beginning of the project. However, we 
were aware t hat the hardware was capable of supporting 
much higher performance. With a goal in mind of a SIG- 
GRAPH 97 announcement in August, we redesigned the 
device driver. The redesign optimized certain paths 
through the driver, enabling much higher performance 
for this benchmark and for important applications suc h as 
Unigraphics and Structural Dynamics Research Coipora- 
tion (SDR('). As a result, we were able to announce a 
CDRS-O.S rating of over 100 at S1GGRAPII 97. 

hi addition to benchmark performance, the team focused 
on application performance because it is typically this 
measure thai determines whether a customer will buy t he 
product. We obtained a variety of in-house applications 



and built up expertise in running the applications. We 
also obtained data sets that represented typical customer 
workloads mid adjusted various performance parameters 
(such as display list size) to maximize performance for 
the benchmark. Using this technique, the performance 
with some data sets was up to 100 times faster. 



Conclusion 



With VISUALIZE fx 1 . Hewlett-Packard has the fastest 
Windows NT graphics on the market. '- :| Integrated into 
the HP Kayak XW platform, the graphics device and its 
successors will help Hewlett-Packard maintain its markel 
leadership. 
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oncurrent engineering is the convergence, in time and purpose, of 
interdependent engineering tasks. The benefits of concurrent engineering 
versus traditional serial dependency are shown in Figure 1. C areful planning 
and management of the concurrent engineering process result in: 

■ Faster time to market 

■ Lower engineering expenses 

■ Improved schedule predictability. 

This article discusses the use of concurrent engineering for ( ipenGL product 
development at the IIP Workstation Systems Division. 

OpenGL Concurrent Engineering 

We applied concurrent engineering concepts in the development of OUT 
OpenGL product in a number of ways, including: 

■ Closely coupled system design with partner laboratories 

■ Software arc hitecture ;uid design verification 

■ Real-use hardware verification 

■ Hardware simulation 

■ Milestones and communication 

■ Joint hardware and software design reviews 

■ Test programs writ I en in parallel. 
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Cultural Enablers 

In addition to these technical tactics, the OpenGL team 
enjoyed the benefits of several cultural enablers that have 
been nurtured over many years to encourage concurrent 
engineering. These include early concurrent si ai ling, an 
environment thai invites, expects, and supports bottoms-up 
ideas to improve time to market, and the use of a focused 
program team to use expertise and gain acceptance from 
all functional areas and partners. 

System Design with Partner Labs 

We worked closely with the compiler and operating sys- 
tem laboratories to design new features to greatly im- 
prove our performance (see the "System Design Results" 
section in the article on page 9). Our early system design 
revealed that OpenGL inherently requires approximately 
ten limes more procedure calls and graphics device ac- 
cesses than our previous graphics libraries. This large 
increase in system use meant we had to minimize these 
costs we previously had been able to amortize over a 
complete primitive. 



We worked closely with our partner laboratories to ensure 
success. Our management sec ured partner acceptance, 
funding, and staffing, and the engineers worked on the 
joint system design. Changes of this magnitude in the 
kernel and the compiler take time, and we could not af- 
ford to wait until we had graphics hardware and software 
running for problems to occur. Rather, we used careful 
system performance models and competitive performance 
projections to create processor stale count budgets for 
procedure calls and device access. These performance 
goals guided our design. In feet, our first design to improve 
procedure call overhead missed by a few states per c all, 
so we had to get more creative with our design lo arrive 
at an industry-leading solution. We managed these de- 
pendencies throughout the projecl with frequenl commu- 
nication and interim milestones. 

Software Architecture and Design Verification 

We designed and followed a risk-driven life cycle. To sup- 
port the concurrent engineering model, we needed a life 
cyc le that avoided the big bang approach of integrating all 



Figure 1 

The benefits of concurrent engineering. 
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Figure 2 

OpenGL concurrent engineering techniques 
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the pieces at the end. This would result in a longer and 
less predictable time to market. Instead, we created a 
prototyping environment. This environment was initially 
created to test the software architecture and early design 
decisions. The life cycle included a number of check- 
points focused on interface specification, design, and 
prototyping. 

One key prototyping checkpoint in this environment is 
what we called our "vertical slice," which represented a 
thin, tall slice through the early OpenGL architecture (see 
Figure 2). Thin because it SUppOftS a small subset of the 
full OpenGL functionality, and tall because it exercises all 
portions of the software architecture, from the API to the 
device driver-level interface. With this milestone, we had 
a simple OpenGL demonstration running on our software 
prototype. 

The objectives of this vertical slice were to verify the 
< )penGL software architecture and design, create a proto- 
typing design environment, and rally the team around this 
key deliverable. 



Hardware Verification 

Before we had completed verification of the software ar- 
chitecture, it became evident that this same environment 
needed to be quickly adapted and evolved to handle the 
demands of hardware verification. OpenGL features and 
performance represented the biggest challenge for the 
new VISUALIZE fx hardware. Although litis hardware 
would also support our legacy APIs (Starbase. PHIGS, 
PEX), most of the newness and therefore risk was con- 
tained in our support of OpenGL By evolving our proto- 
typing environment for use as the hardware verification 
vehicle, we were able to exercise the hardware model in 
real-use scenarios (albeit considerably slower than full 
performance). 

Evolving this environment for hardware verification re- 
quired us to take the prototyping further than we would 
have for software verification alone. We had to add more 
functionality to more fully test the OpenGL features in 
hardware. We also had to do so quickly to avoid delaying 
the hardware tape release. 

This led to our second key prototyping checkpoint, which 
we called "OpenGL turn on." This milestone included the 
same OpenGL demonstration running on the VTSl T ALIZE 
fx hardware simulator. We also added functionality 
breadth to the vertical slice (see Figure 2 |. Doing all this 
for a new OpenGL API represented a new level of concur- 
rent engineering, in that we were tunning OpenGL pro- 
grams on a prototype OpenGL library and driver and dis- 
playing pictures on simulated V1SI ALI/.E fx hardware, all 
more than a year before shipments. 

The key objective of this milestone was to verify system 
design across the API, driver, operating system, and hard- 
ware. The system generated pictures and. more impor- 
tantly, spool files (command and data streams that cross 
the hardware and software interface). These spool files 
are then run against the hardware models to verify hard- 
ware design under real < >pen< iL use scenarios. 

This prototyping environment has the following 
advantages: 

■ Reduces risk for system design and component design 
□ Resolve integration issues early 
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Identify holes and design or architecture flaws 
i Enable prototyping to evaluate design alternatives 

■ Enables key deliverables (hardware verification spool 
files) 

■ Creates exciting focal points for developers 

■ Fosters teamwork 

■ Enables joint development 

■ Provides a means to monitor progress 

■ Provides a, jump start to our code development phase. 

This environment also lias potential downsides. We felt 
there was a risk that developers would feel that the need 
or desire to prototype (for system turn on and hardware 
verification) could overshadow the importance of product 
design. We did not want to leave engineers with the model: 
write some code, give it a try, and ship it if it works. 

Thus, to keep the benefits of this environment and mit i- 
gate these potential downsides, we made a conscious de- 
cision to switch gears from system turn on and prototype 
mode to product code development mode. This point 
came after we had delivered the spool files required for 
hardware verification and before we had reached our 
design complete checkpoint From that point on, we 
prototyped only for design purposes, not for enabling 
more system functionality. We also created explicit check- 
points for replacing previously prototyped code with 
designed product code. This was an important shift to 
avoid shipping prototype code. All product code had to 
be designed and reviewed. 

Hardware Simulation 

One key factor in our concurrent engineering process is 
hardware simulation. A detailed discussion of the hard- 
ware simulation techniques used in our project are be- 
yond the scope of this article. Briefly, we use three levels 
of hardware simulation: 

■ A behavioral model ( written in C) 

■ A register transfer level model (RTL) 

■ A gate model, which models the gate design and imple- 
mentation. 

The adv antages of the behavioral model are that it can be 
done w ell before the RTL and gate model so w ; e can use it 
with other components and prototypes. The behavioral 



model is also significantly faster than the other models 
(though still about 100 limes slower than the real product ), 
allowing us to run many simple real programs on it. The 
RTL model runs in VerUog and runs about one million 
limes slower than the real product. This limits the number 
and size of test cases that can be run. The gate model is 
even slower. Even so, we kept over 30 workstations busy 
around the clock for months running these models. Often 
a simulation run will use G models for all but one of the 
new chips, with the one chip being simulated at the gate 
level. 

Milestones and Communication 

We set up a number of R&D milestones to guide and track 
our progress. The vertical slice and OpenGL turn on were 
two such key milestones. < )pcnGL developer meetings 
were held monthly to make sure that everyone had a clear 
understanding of where we were headed and how each of 
the developers* contributions helped us get there. 

Software and Hardware Design Reviews 

The hardware and software engineers also held joint de- 
sign reviews. The value of design reviews is to minimize 
defects by enabling all the engineers to have the same 
model of the system and to catch design flaws early and 
correct them while defect finding and fixing is still inex- 
pensive in terms of schedule and dollars. 

On the software side, the review process focuseil heavily 
on up-front design reviews (where changes are cheaper) 
to get the design right. We maintained the importance of 
doing inspections but reduced the inspection coverage 
from 100 percent to a smaller representative subset of 
code, as determined by the review team. We also in- 
creased the number of reviewers at the design reviews and 
reduced the participation as we moved to code reviews. 
We maintained a consistent core set of reviewers who 
followed the component from design to code review. 

Tests Written in Parallel 

To bring more parallelism to the development process, 
we had an outside organization develop our OpenGL test 
programs. By doing so, we were able to begin nightly 
regression testing simultaneous with die code completion 
checkpoint because the test programs were immediately 
available. Historically, the developers have written the 
tests following design and coding. This translates into 
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a lull between the rode completion checkpoint and the 
beginning of the testing phase. 

Parallel development of t he tests with the design and 
implementation of tile system was a key success factor 
in our ability to ship a high-quality, software-only beta 
version of our OpenGL product. No severe defects were 
found in this beta product — our first OpenGL customer 
deliverable. 

One thing we learned from using an outside organization 
to help with test writing was that writing test plans is 
more a part of design than of testing. The developers, 
with intimate knowledge of the API and the design, were 
able to write much more comprehensive test plans than 
the outside organization. 



Conclusion 



We achieved several positive results through the use of 
concurrent engineering on our OpenGL product. Ulti- 
mately, we reduced rime to market by several months. 
Along the way. we made performance and reliability im- 
provements in our software and hardware architectures 
and implementations, and we likely prevented a chip turn 
or two. which would have cost significant time to market 

Silicon Graphics and OpenGL aie registered trademarks of Silicon Graphics Inc in me United 
States and othei countries 

Direct 3D is all S registered trademark ot Microson Corporation 
Mictosott is a U.S. registeied trademark at Microsoft Corporation 
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Multiple monitors can be configured as a contiguous viewing space to 
provide more screen space so that users can see most, if not all, of their 
applications without any special window manipulations. 



I 



. n today's computing environment, screen space is at a premium. The 
entire screen can be easily consumed when primary work-specific applications 
are used together with browsers, schedulers, mailers, and editors. This forces 
the user to continuously shuffle windows, which is both distracting and 
unproductive. 

The advanced display technologies described here allow users to increase 
productivity by reducing the lime spent manipulating windows. Three 
technologies are discussed: 

■ Multiscreen 

■ Single logical screen (SLS) 

■ SLSclone. 

implementation details and procedures for configuring HP-UX workstations to 
use the SLS technology are described in references 1 and 2. 

Multiscreen 

When considering the problem of limited screen space, one solution that 
comes to mind is to use a bigger monitor with a higher resolution. 
Unfortiu lately, it is often impractical to add a monitor with a resolution high 
enough to accommodate all the data a user wants to view. Although demand has 
increased for monitors of higher resolution, such as 2K by 2K pixels, they are 
still too expensive for companies to place on every desktop. In addition, these 
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largo monitors are cumbersome and heavy. There are also 
safety considerations: the monitor must be stable and 
pro|x*rly supported. 

A more practical, cost-effective solution is to use addi- 
tional standalone monitors to increase the amount of 
visible screen space. The X Window System (XI 1) stan- 
dard incorporates a feature known as multiscreen, which 
provides this type of environment. In multiscreen configu- 
rations, a single X serv er is used to control more than one 
graphics device and monitor simultaneously. These types 
of configurations are only possible on systems containing 
multiple graphics devices. 

In these multiscreen scenarios, a single mouse and key- 
board are shared between screens. This allows the pointer 
but not the windows to move between screens. Each ap- 
plication must be directed to a specific screen to display 
its windows. This is done by either using the -display com- 
mand line argument or by setting the DISPLAY environ- 
ment variable. 

Figure 1 shows a two-monitor multiscreen configuration. 
Both monitors are connected to the same worksiation and 
are controlled by the same X server. This type of configu- 
ration effectively doubles the visible workspace. For exam- 
ple, users could have their alternate applications, such as 
web browsers, mailers, and schedulers on the left-hand 
monitor and their primary applicat ions on t he right -hand 
monitor. Since the X server controls both screens, the 
pointer can move between screens and be used with any 

application. 

Multiscreen offers the advantage that ii will work willi 
any graphics device. There are no constraints that the 
graphics devices be identical or have the same properties, 



Figure 1 

A multiscreen configuration. 
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Cursor wraparound in a multiscreen configuration 
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For example, on an HP 9000 Model 715 workstation con- 
taining an HCRX24 display (a 24-plane device) and an 
internal color graphics display (an S-plane device), the user 
can still create a multiscreen configuration. Of course, 
those applications directed lo the HCRX24 will have ac- 
cess t o 24 planes while those contained on the other are 
limited to 8 planes. Currently, the HP-I ; X X server allows 
a maximum of four graphics devices to be used in a multi- 
screen configuration. 

The HP-UX X server also provides several enhancements 
to simplify the use of a multiscreen configuration. If a user 
has a l-by-3 configuration (Figure 2a), there may be a 
need lo move Ihe pointer from screen .'i to screen 1. This 
reqtitres moving the pointer from screen 3 to screen 2 to 
screen 1. By specifying an X server configuration option. 
Ilie user can move Ihe pointer off Ihe right edge of screen 
:t, and the pointer will wrap to screen 1 (Figure 2b). The 
same screen wrapping functionality can be provided if the 
user has configured the screens in a column. Finally, a 
2-by-2 configuration can Contain both horizontal anil verti- 
cal sc reen wrapping. 

Although multiscreen is convenient, it has shortcomings. 
Namely, the monitors function as separate entities, rather 
than as a contiguous space. The different screens within a 

multiscreen configuration cannot communicate with one 

another with respect to window placement. This means 
that windows cannot be moved between monitors. Once 
a window is created, it is bound to the monitor where it is 
created. Although some third-party solutions are available 
to help alleviate this problem, they are costly, ineonve- 
nient (sometimes requiring Ihe application to make code 
changes), and lack performance. 
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The lack of communication between screens with respect 
to window placement forces users to direct their applica- 
tions towards a specific screen at application start time. 
After a screen has been selected all additional subwin- 
dows will be confined to that screen. With today's larger 
applications, it is possible to find that certain screens still 
get overcrowded, resulting in the original predicament of 
having to iconify and raise windows. 

Single Logical Screen 

To remedy the shortfall of the multiscreen configuration, 
HP developed a technology called single logical screen 
(SLS)." SLS has been incorporated into the HP standard 
X Server product and allows multiple monitors to act as a 
single, larger, contiguous screen. As a result, windows can 
move across physical screen boundaries, and they can 
span more than one physical monitor. In addition, SLS 
functionality has been implemented in an application- 
transparent manner. This means that any application cur- 
rently running on HP-UX workstations will run, without 
modification, under SLS. Therefore, SLS is not an API that 
application writers need to program to or that an applica- 
tion needs to be aware of. The application simply sees a 
large screen. This ease-of-use lets end users take advan- 
tage of a large workspace without requiring applications 
to be rewritten or recompiled. 

.Many of electronic design automation (EDA) and computer- 
aided design applications can benefit front SLS. Some of 
these applications, by themselves, can easily occupy an 
entire screen while only showing a fraction of the desired 
information. For example, with more screen real estate, 
an EDA application can simultaneously display wave- 
forms, schematics, editors, and other data without ha\ing 
any of this information obscured. To do this on a work- 
station with only a single monitor would require display- 
ing the waveforms, schematics, and other items in such 
small areas as to be unreadable. 

On HP-UX Workstations, a single logical screen actually 
represents a collection of homogeneous graphics devices 
whose output has been combined into a single screen. 
Figure 3. shows an example of a l-by-2 SLS configura- 
tion. Most HP-UX workstations are not limited to only 
two graphics devices. Some models support up to four 
devices. When using these graphics devices to create an 
SLS environment, any rectangular configuration is allowed. 



Figure 3 

A l-by-2 SLS configuration. 
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SLSclone 

SLSclone is similar to the SLS configuration. The differ- 
ence is that the contents from a selected monitor are 
replicated on all other monitors in the configuration (see 
Figure 4). A user can dynamically switch between SLS 
and SLSclone using an applet being shipped with the 
HP-UX 10.20 patch PHSS_12462 or later. 

This functionality is useful in an educational or instruc- 
tional environment. Instead of crow-ding many users 
around a single monitor to view its contents, SLSclone 
can be used to pipe these contents to neighboring moni- 
tors. As with SLS. SLSclone currently supports up to four 
physical monitors, depending on the workstation model. 

SLSclone functionality easily lends itself to a collaborative 
work environment. If additional people enter a user's 
office to debug some software source code, for example, 
the user can quickly switch the SLS configuration into an 
SLSclone configuration, and the debugging screen will be 
displayed on all monitors. Also, the additional monitor 
can easily be adjusted to the correct height and tilt with- 
out affecting the original user's view of the display. 



Figure 4 

An example of a l-by-2 SLSclone configuration. 
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Figure 5 

A hybrid configuration consisting of a l-by-2SLS with multi- 
screen. 
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SLS and Multiscreen 

Even with the benefits of SLS. there may be cases in 
which a user will want to use SLS and multiscreen at the 
same time. For example, a user could have a l-by-2 SLS 
configuration acting as one screen, and a third monitor 
acting as a second screen. A depiction of this is shown in 
Figure 5. 

In this type of configuration, a user can move windows 
between physical monitors 1 and 2 but not drag a window 
from monitor 2 to monitor 3. The pointer, however, can 
move between all monitors. Tliis type of hybrid configura- 
tion can he useful in a software development environment. 
All of the necessary editors, compilers, and debuggers ran 
be used on monitors 1 and 2, and the application can be 
run and tested on monitor 3. 

If a workstation supports four graphics devices, another 
possible hybrid configuration is to use two screens, 
each of which consists of a two-screen SLS configuration 
| Figure 6). 

In this configuration, windows can be moved between 
monitors 1 and 2 or between monitors 3 and 4. However, a 
window cannot be moved between monitors 2 and As 



with all multiscreen configurations, the pointer can move 
across all four monitors. These two screens could also 
be placed vertically, resulting in a 2-by-2 monitor arrange- 
ment and a 2-by-l multiscreen configuration. 



Conclusion 



Advanced display configurations can be used to increase 
productivity. The increase in screen space facilitates col- 
laboration and communication of information. We have 
also found that these configurations are very useful for 
independent software vendors (ISVs) who demonstrate 
their applications on HP-UX workstations. They appreci- 
ate die additional screen space because they are able to 
display more information and rapidly describe Uieir prod- 
ucts without losing their customers' attention. 

Finally, the configuration of an advanced display is ac- 
complished in an easy and straightforward manner through 
the HP-UX System Administration Manager (SAM ). Addi- 
tional information on advanced display configurations 
and other exciting X server features are available at: 
http://www.hp.com/go/xwindow 

HP-UX Release 10.20 and later and HP-UX 1 1 00 and lata Hn both 32- and 64-6/f conhgura 
lions} on all HP 9000 computers are Open Group UNIX 95 branded products. 

UNIX is a registered trademark ot The Open Group 
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Two l-by-2 SLS configurations combined via multiscreen. 
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In the highly competitive workstation market, customers demand a wide range 
of cost-effective, high-performance I/O solutions. An industry-standard I/O 
subsystem allows HP workstations to support the latest I/O technology. 



I, 



. ndustry-standard I/O buses like the Peripheral Component Interconnect 
(PCI) allow systems t o provide a wide variety of c ost-effective I/O functionality. 
The desire to include more industry-standard interfaces in computer systems 
continues to increase. This article points out some of the specific methodolo- 
gies used to implement and verify the PCI interface in MP workstations and 
describes some of the challenges associated with interfacing with industry- 
standard I/O buses. 

PCI for Workstation* 

One of the greatest challenges in designing a workstation system is determining 
the best way to differentiate the design from competing products. This decision 
determines where the design team will focus their efforts and have the greatest 
opportunity to innovate, in the computer workstation industry, the focus is 
typically on processor performance coupled with high-bandwidth, low-latency 
memory connections to feed powerful graphics devices. The performance of 
nongraphics I/O devices in workstations is increasing in importance, but the 
availability of cost-effective solutions is still the chief concern in designing an 
I/O subsystem. Rather than providing a select few exotic high-perfonnance I/O 
solutions, it is better to make sure that there is a wide range of cost-effective 
solutions to provide the I/O functionality that each customer requires. Since 
I/O performance is not a primary means of differentiation and since maximum 
flexibility with appropriate price and performance is desired, using an 
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industry-standard I/O tuts thai operates with high-volume 
c ards from multiple vendors is a good choice. 

The PCI bus is a recently established standard lhai has 
achieved wide acceptance in the PC industry, Mosl new 
general-puipose I/O cards intended for use in PCs and 
workstations are now being designed for PCI. The PCI 
bus was developed by the PCI Special Interest Group 
(PCI SIG). which was founded by Intel and now consists 
of many computer vendors. PCI is designed to meet todays 
1/0 performance needs and is scalable to meet future 
needs. Having PCI in workstation systems allows the use 
of competitively priced cards already av ailable for use in 
the high-volume PC business. It also allows workstations 
to keep pace with new I/O functionality as it becomes 
available, since new devices are typically designed for the 
industry-standard bus first and only later (if at all ) ported 
to other standards. For these reasons, the PCI bus has 
been implemented in the HP B-ciass and ('-class work- 
stations. 

PCI Integration Effort 

Integrating PCI into our workstation products required 
a great deal of work by both the hardware and soli ware 
teams. The hardware effort included designing a bus 
interface ASIC (application-specific integrated circuit) 
to connect to the PCI bus and then performing functional 
and electrical testing to make sure that the implementa- 
tion would work properly. The softw are effort included 
wilting firmware to initialize and control the bus interface 
ASIC and PCI cards and w riting device drivers to allow 
the HP-UX* operating system to make use of the PCI 
cards. 

The goals of the effort to bring PCI to HP workstation 
products were to: 

■ Provide our systems with fully compatible PCI to 
allow the support of a wide variety of I/O cards and 
functionality 

■ Achieve an acceptable performance in a cost-effective 
manner for cards plugged into the PCI bus 

■ Create a solution that does not cause performance 
degradation in the CPl "-memory-graphics path or in any 
of the other I/< ) devices on other buses in the system 



■ Ship the first PCI-enabled workstations: the Hewlett- 
Packard B132, B160, C160, and C180 systems. 

Challenges 

Implementing an industry-standard I/O bus might seem 
to be a straightforward endeavor. The PCI interface has 
a thorough specification, developed and Influenced by 
many experts in the field of I/O bus architectures. There 
is momentum in the industry to make sure the standard 
succeeds. This momentum includes card vendors work- 
ing to design I/O cards, system vendors working through 
the design issues of the specification, and test and mea- 
surement firms developing technologies to test the design 
once it exists. Many of these elements did not yet exist 
and were challenges for earlier Hew lett-Packard propri- 
etary I/O interface projects. 

Although there were many elements in the team's favor 
that did not exist in the past, there were still some signifi- 
cant tasks in integrating tius industry-standard bus. These 
tasks included: 

■ Designing the architecture for the bus interface ASK '. 
which provides a high-performance interface between 
the internal proprietary workstation buses and PCI 

■ Verifying that the bus interface ASIC does what it is 
intended to do. both in compliance with PCI and in 
performance goals defined by the team 

■ Providing the necessary system support, primarily in 
the form of firmware anil system software to allow 
cards plugged into the slots on the bus interface ASIC 
to work with the HP-UX operating system. 

With these design tasks identified, there still remained 
some formidable challenges for the bus interface ASIC 
design and verification and the software development 
teams. These challenges included ambiguities in the PCI 
specification, difficulties in determining migration plans, 
differences in the way PCI cards can operate within the 
PCI specification, and the unavailability of PCJ cards 
with the necessary HP-UX drivers. 
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Architecture 

The Bus Interface ASIC 

The role of the bus interface ASIC is to bridge the HP 
proprietary I/O bus, railed the general system connect 
(GSC) bus, to the PCI bus in the HP B-class and C-class 
workstations. Figures 1 and 2 show the B-class and 
C-class workstation system block diagrams with the bus 
interface ASIC bridging the GSC bus to the PCI bus. The 
Runway bus shown in Figure 2 is a high-speed processor- 
to-memory bus. 1 

The bus interface ASIC maps portions of the GSC bus 
address space onto the PCI bus address space and vice 
versa. System firmware allocates addresses to map be- 
tween the GSC and PCI buses and programs this informa- 
tion ihtti configuration registers in the bus interface ASIC. 
Once programmed, the bus interface ASIC performs the 
following tasks: 

■ Forward writes transactions from the GSC bus to the 
PCI bus. Since the write originates in the processor, this 
task is called a processor I/O write. 

■ Forward reads requests from the GSC bus to t he PCI 
bus, waits for a PCI device to respond, and returns I he 



Figure 1 



HP B-class workstation block diagram. 
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read data from the PCI bus back to the GSC bus. Since 
the read originates in the processor, this task is called 
a processor I/O read. 

■ Forward writes transactions from the PCI bus to the 
GSC bus. Since the destination of the write transaction 
is main memory, litis task is called a direct memory' 
access (DMA) write. 

■ Forward reads requests from the PCI bus to the GSC 
bus. waits for the GSC host to respond, and returns the 
read data from the GSC bus to the PCI bus. Since the 
source of the read data is main memory, this task is 
called a DMA read. 

Figure 3 shows a block diagram of the internal architec- 
ture of the bus interface ASIC. The bus interface ASIC 
uses five asynchronous FIFOs to send address, data, and 
transaciiun information between the GSC and PCI buses. 



Figure 2 

HP C-class workstation block diagram. 
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Figure 3 

A block diagram of the architecture lor the bus interlace ASIC. 
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A FIFO is a memory device that has a pott for writing data 
into the FIFO and a separate port for reading data out of 
the FIFO. Data is read from the FIFO in the same order 
that it was written into the FIFO. The GSC bus clock is 
asynchronous to the PCI bus clock. For this reason, the 
FIFOs need to be asynchronous. An asynchronous FIFO 
allows the data to be written into the FIFO with a clock 
that is asynchronous to the clock used to read data from 
the FIFO. 

Data Hows through the bus interface ASIC are as follows: 

■ Processor I/O write: 

The GSC interface receives both the address and the 
data for the processor I/O write from the GSC bus and 
loads them into the processor I/O FIFO. 

The PCI interface arbitrates for the PCI bus. 



r The PCI interface unloads the address and data from 
the processor I/O UFO and masters the write on the 
PCI bus. 

Processor I/O read: 

a The GSC interface receives the address for the pro- 
cessor I/O read from the GSC bus and loads it into the 
processor I/O FIFO. 

= The PCI interface arbitrates for the PCI bus. 

The PCI interface unloads the address from the pro- 
cessor I/O FIFO and masters a read on the PCI bus. 

c The PCI interface waits for the read data to return and 
loads the data into the processor IA ) read return FIFO, 

The GSC interface unloads the processor I/O read 
reluni FIFO anil places the read data on the GSC bus. 
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■ DMA Write: 

- The PCI interface receives both the address and the 
data for the DMA write from the PCI bus and loads 
them into the DMA FIFO. 

The PCI interface loads control information for the 
write into the DMA transaction FIFO. 

The GSC interface arbitrates for the GSC bus. 

The GSC interface unloads the write command from 
the DMA transaction FIFO, unloads the address and 
data from the DMA FIFO, and masters the write on 
the GSC bus. 

■ DMA Read: 

The PCI interface receives the address for the DMA 
read from the PCI bus and loads it into the DMA FIFO. 

c The GSC interface arbitrates for the GSC bus. 

The GSC interlace unloads the address from the DMA 
FIFO and masters a read on the GSC bus 

The GSC interface then waits for the read data to 
return and loads the data into the DMA read return 
FIFO. 

The PCI interface unloads the DMA read return FIFO 
and places the read data on the PCI bus. 

Architectural Challenges 

One of the difficulties of joining two dissimilar I/O buses is 
achieving peak I/O bus performance despite the fart thai 
the transaction structures arc different for both I/< > buses. 
For example, transactions on the (JSC bus are fixed length 
with not more than eight words per transaction while 
transactions on the PCI bus are of arbitrary length. It is 
critical to create long PCI transactions to reach peak 
bandwidth on the PCI bus. For better performance and 
whenever possible, the bus interface ASIC coalesces mul- 
tiple processor I/O write transactions from the GSC bus 
into a single processor I/O write transaction on the PCI 
bus. For DMA writes, the bus interface ASIC needs to dc- 
terrnine the optimal method of breaking variable-size PCI 
1 rnnsactions into one-, t wo-, four-, or eight-word GS( " 
transactions. The PCI interface breaks DMA writes into 
packets and communicates the transaction size to 
the GSC interface through the DMA transaction FIF< ). 



Another difficulty of joining two dissimilar I/O buses is 
avoiding deadlock conditions. Deadlock conditions can 
occur when a transaction begins on both the (JSC and PCI 
buses simultaneously. For example, if a processor I/O read 
begins on the GSC" bus at the same time a DMA read be- 
gins on the PCI bus. then the processor I/O read will wait 
for the DMA read to be completed before it can master its 
read on the PCI bus. Meanwhile, the DMA read will wail 
for the processor I/O read to be completed before it can 
master its read on the GSC bus. Since both reads are wait- 
ing for the other to be completed, we have a deadlock 
case. One solution to this problem is to detect the dead- 
lock case and retry or split one of the transactions to 
break the deadlock. In general, the bus interface ASIC 
uses the GSC split protocol to divide a GSC transaction 
and allow a PCI transaction to make forward progress 
whenever it detects a potential deadlock condition. 

Unfortunately, the bus interface ASIC adds more latency 
to the round trip of DMA reads. This extra latency can 
have a negative affect on DMA read performance. The 
C-class workstation has a greater latency on DMA reads 
than the B-class workstation. This is due primarily to the 
extra layer of bus bridges that die DMA read mast traverse 
to get to memory and back (refer to Figures 1 and 2). 
The performance of DMA reads is important to outbound 
DMA devices such as network cards and disk controllers. 
The extra read latency is hidden by prefetching consecu- 
tive data words from main memory with the expectation 
that the I/O device needs a block of data and not just a 
word or two. 

Open Standard Challenges 

The PCI bus specification, like most specifications, is not 
perfect There are areas where the specification is vague 
and open to interpretation. Ideally, when we find a vague 
area of a specification, we investigate how other design- 
ers have interpreted the specification and follow the 
trend. With a proprietary bus this often means simply con- 
tacting our partners within IIP and resolving the issue. 
With an industry-standard bus. our partners arc not Within 
the company, so resolving the issue is more difficult. The 
PCI mail reflector, which is run by the PCI SIG at 
www.pcsig.com, is sometimes helpful for resolving such 
issues. Monitoring (he PCI mail reflector also gives the 
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benefit of seeing whal parts of the PCI specification ap- 
peal - vague to others. .Simply put, engineers designing to 
a standard need a forum for communicating with others 
using that standard. When designing to an industry stan- 
dard, that forum must by necessity include wide represen- 
tation from the industry. 

The PCI specification lias guidelines and migration plans 
that PCI card vendors ait' encouraged to follow, hi prac- 
tice, PCI card vendors are slow to move from legacy 
standards to follow guidelines or migration plans. For 
example, the PCI bus supports a legacy I/O address 
space that is small and fragmented The PCI bus also has 
a memory address space that is large and has higher write 
bandwidth Hum the I/O address space. For obvious rea- 
sons, the PCI specification recommends that all PCI cards 
map their registers to the PCI I/O address space and the 
PCI memory address space so systems will have the mosl 
flexibility in allocating base addresses to I/O cards. In prac- 
tice, most PCI cards still only support the PCI address I/O 
space. Since we believed thai the PCI I/O address space 
would almost never be used, trade-offs were made in the 
design of the bus interface ASIC thai compromised the 
performance of transactions to the PCI I/O address space. 

Another example in which the PCI card vendors follow 
legacy standards rather than PCI specification guidelines 
is in the area of PCI migration from 5 volts to 3.3 volts. 
The PCI specification defines two types of PCI slots: one 
for a 5-volt signaling environment and one for a 3.3-voIt 
signaling environment The specification also defines 
three possible types of I/O cards: 5-volt only, 3.3-volt only, 
or universal. As their names imply, 5-volt-only and 3.3-volt- 
only cards only work in 5-volt and 3.3-volt slots respec- 
tively. Universal cards can work in either a 5-volt or 
3.3-volt slot. The PCI specification recommends that PCI 
card vendors only develop universal cards. Even though 
it costs no more to manufacl lire a universal card than a 
5-volt Card, PCI card vendors are slow to migrate to uni- 
versal cards until volume platforms (that is, Intel-based 
PC platforms) begin to have 3.3-volt slots. 

Verification 

Verification Methodology and Goals 

The piupose of verification is to ensure t hat I he bus inter- 
face ASIC correctly meets the requirements described in 

* Legacy refers to the Intel I/O port space 



the design specification. In our VLSI development process 
this verification effort is broken into two dislincl parts 
called phase- 1 and phase-2. Both parts have the intent of 
proving thai the design is correct, but each uses different 
tools and methods to do so. Phase- 1 verification is carried 
out on a software-based simulator using a model of Ihe 
bus interface ASIC. Phase-2 verification is carried out on 
real chips in real systems. 

Phase-1. The primary goals of phase-1 verification can be 
summarized as correctness, performance; and compliance. 
Proving correctness entails showing dial the Verilog model 
of Ihe design properly produces the behavior detailed in 
the specification. This is done by si tidying Ihe design 
specification, enumerating a function list of operations 
and behaviors thai Ihe design is required to exhibit, and 
generating a suite of tests that verify all items on that 
function list. Creating sets of randomly generated trans- 
action combinations enhances Ihe test coverage by expos- 
ing the design to numerous corner cases. 

Performance verification is then carried out to prove thai 
the design meets or exceeds all important performance 
criteria. This is verified by first identifying the important 
performance cases, such as key bandwidth* and latencies, 
and then generating tests that produce simulated loads 
for performance measurement. 

Finally, compliance testing is used to prove that the bus 
protocols Implemented in the design will work correctly 
with other devices using the same protocol. For a de- 
sign such as the bus interface ASIC that implements an 
industry-standard protocol, special consideration was 
given to ensure that the design would be compatible with 
a spectrum of outside designs. 

Phase-2. This verification phase begins with the receipt 
of ihe fabricated parts. The effort during this phase is pri- 
marily focused on testing the physical components, with 
simulation techniques restricted to the supporting role of 
duplicating and better understanding phenomenon seen 
on the bench. The goals of phase-2 verification can be 
summarized as compliance, performance, and compati- 
bility. Therefore, pail of phase-2 is spent proving that the 
physical device behaves on the bench the same as it did 
in simulation. The heart of phase-2, however, is that the 
design is finally tested for compatibility with the actual 
devices that it will be connected to in a production sysiem. 
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Verification Challenges 

From the point of view of a verification engineer, Uteri- 
are benefits and difficulties in verifying the implementa- 
tion of an industry-standard bus as compared to a pro- 
prietary bus. One benefit is that since PCI is an industry 
standard, there are plenty of off-the-shelf simulation and 
verification tools available. The use of these tools greatly 
reduces the engineering effort required for verification, 
but at the cost of a loss of control over the debugging and 
feature set of the t ools. 

The major verification challenge (particularly in phase-1 ) 
was proving compliance with the PCI standard. When 
verifying compliance with a proprietary standard there 
are typically only a few chips that have to be compatible 
with one another. The design teams involved can resolve 
any ambiguity in the bus specification. This ac tivity tends 
to involve only a small and well-defined set of individuals. 
In contrast, when verifying compliance with an open stan- 
dard there is usually no canonical source that can provide 
the correct interpretation of the specification. Therefore, 
it is impossible to know ahead of time where devices will 
differ in their implementation of the specification. This 
made it somewhat difficult for us to determine the specific 
tests required to ensure compliance with the PCI standard. 
In the end, it matters not only how faithfully, the specifica- 
tion is followed, but also whether or not the design is com- 
patible with whatever interpretation becomes dominant. 

The most significant challenge in phase-2 testing came in 
getting the strategy to become a reality. The strategy de- 
pended heavily on real cards with real drivers to demon- 
strate PCI compliance. However, the IIP systems with 
P( I slots were shipped before any PCI cards with drivers 
were supported on IIP workstations. Creative solutions 
were found to develop a core set of drivers to Complete 
the testing. However, this approach contributed to having 
to debug problems closer to shipment than would have 
been optimal. Similarly, 3.3-volt slots were to be sup- 
ported at first shipment. The general unavailability of 
3.3-volt or universal (supporting 5 volts and 3.3 volts) 
cards hampered this testing. These are examples of the 
potential dangers of "preenabling" systems with new 
hardware capability before software and cards to use 
the capability are ready. 

An interesting compliance issue was uncovered late in 
phaaeS. One characteristic of the PA 8000 ('-class system 
is that when the system is heavily loaded. I he bus interface 



ASIC can respond to PCI requests with either long read 
latencies (over 1 us before acknowledging the transaction) 
or many (over 50) sequential PCI retry cycles. Both befaav- 
iors are legal with regard to the PCI 2.0 bus specification, 
and both of them are appropriate given the circumstances. 
However, neither of these behaviors is exhibited by Intel's 
PCI chipsets, which are the dominant implementation of 
the PCI bus in the PC industry. Several PCI cards worked 
fine in a PC, but failed in a busy C-class system. The PCI 
card vendors had no intention of designing c ards that 
were not PCI compliant, but since they only tested their 
cards in Intel-based systems, they never found the prob- 
lem. Fortunately, the card vendors agreed to fix this issue 
on each of their PCI cards. If there is a dominant imple- 
mentation of an industry standard, then deviating from 
that implementation adds risk. 

Firmware 

Firmware is the low-level software that acts as the inter- 
face between the operating system and the hardware. 
Firmware is typically executed from nonvolatile memory 
at startup by the workstation. We added the following 
extensions to the system firmware to support PCI: 

■ A bus walk to identify and map all devices on the PCI 
bus 

■ A reverse bus walk to configure PCI devices 

■ Routines in provide boot capability through specified 
PCI cards. 

The firmware bus walk identifies all PCI (lev ices con- 
nected to the PCI bus and records memory requirements 
in a resource request map. When necessary, the linn ware 
bus walk will traverse PCI-lo-PCI bridges.' Uu PCI device 
has Built-in Self Test ( BIST ), the BIST is run, and if it fails, 
the PCI device is disabled and taken out of the resource 
request map. As the bus walk unwinds, it initializes bridges 
and allocates resources for all of the downstream PCI 
devices. 

Finn ware also supports booting the HP-I'X operating sys- 
tem from two built-in PCI devices. Specifically, firmware 
can load the IIP I N operating system from either a disk 
attached to a built-in PCI SCSI chip or from a file server 
attached to a built-in PCI 100BT LAN chip. 

A PCI-to-PCI bridge connects two PCI buses, forwarding transactions Irom one to the 
other 
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Firmware Challenges 

The first challenge in firmware was the result of another 
ambiguity in the PCI specification. The specification does 
not define how soon devices on the PCI bus must be ready 
to receive their first transaction after the PCI bus exits 
from reset. Several PCI cards failed when they were 
accessed shortly after PCI reset went away. These cards 
need to download code from an attached nonvolatile 
memory before they will work correctly. The cards begin 
the download after PCI reset goes away, and it can take 
hundreds of milliseconds to complete the download. Intel 
platforms delay one second after reset before using the 
PCI bus. This informal compliance requirement meant 
that firmware needed to add a routine to delay the first 
access alter the PCI bus exits reset. 

Interfacing with other ASICs implementing varying levels 
of the PCI specification creates additional challenges. 
Complianc e with PCI 2.0 (PCI-to-PCI) bridges resulted in 
two issues for firmware. First, the bridges added lat ency to 
processor I/O reads. This extra latency stressed a busy 
system and caused some processor I/O reads to timeout 
in the processor and bring down the system. The firm- 
ware was changed so that it would reprogram the proces- 
sor timeout value to allow for this extra delay. The second 
issue occurs when PCI 2.0 bridges are stacked t wo or 
more layers deep. It is possible to configure the bridges 
such that the right combination of processor I/O reads 
and DMA reads will cause the bridges (o retry each others 
transactions and cause a deadlock or starve one of the 
two reads. Our system firmware fixes this problem by 
supporting no more than two layers of PCI-to-PCI bridges 
and configuring the upstream bridge with different retry 
parameters than the downstream bridge. 

Operating System Support 

The HP-UX operating system contains routines provided 
for PCI-based kernel drivers called PCI serrirps. The first 
HP-UX release that pro\ides PCI support is the 10.20 re- 
lease. An Infrastructure exists in the HP-OX operating 
system for kernel-level drivers, but the PCI bus introduced 
sev eral new requirements. The four main areas of direct 
impact include context dependent I/O, driver attachment, 
interrupt service routines QSR), and endian issues. Each 
area requires special routines in the kernel's PCI services. 



Context Dependent I/O 

In the HP-UX operating system, a centralized I/O services 
context dependent I/O (CDIO) module supplies support 
for drivers that conform to its model and consume its 
services. Workstations suc h as the C-class and B-ciass 
machines use the workstation I/O services CDIO ( WSIO 
CDIO) for tins abstraction layer. The WSIO CDIO provides 
general I/O Sendees to bus-specific CDIOs such as EISA 
and PCI. Drivers that are written for the WSIO environ- 
ment are referred to as WSIO drivers. The services pro- 
vided by WSIO CDIO include system mapping, cache 
coherency management, and interrupt service linkage. In 
cases where WSIO CDIO does need to interfac e to the I/O 
bus. WSK ) CDK ) translates the call to the appropriate bus 
CDIO. For the PCI bus, WSIO CDIO relies on services in 
PCI CDIO to cany out bus-specific- code. 

Ideally, all PCI CDIO services should be accessed only 
through WSIO CDIO services. However, there are a 
number of P( I-specifie calls that cannot be hidden with 
a generic WSIO CDIO interface. These functions include 
PCI register operations and PCI bus tuning operations. 

Driver Attachment 

The PCI CDIO is also responsible for attaching drivers to 
PCI devices. The PCI CDIO completes a PCI bus walk, 
identifying attached cards that had been set up by firm- 
ware. The PCI CDIO initializes data structures, such as 
the interface select code (ISC) structure, and maps the 
card memory base register. Next, the PCI CDIO calls the 
list of PCI drivers that have linked themselves to the PCI 
attach chain. 

The PCI driver is called with two parameters: a pointer 
to an ISC structure ( which contains mapping information 
and is used in most subsequent PCI services calls) and an 
integer containing die PCI device's vendor and device IDs. 
If the vendor and device IDs match the driver's interface, 
the driver attach routine can do one more check to verily 
its ownership of the device by reading the PCI subsystem 
vendor ID and subsystem ID registers in the configuration 
space. If the driver does own this PCI device, it typically 
initializes data structures, optionally links in an interrupt 
service routine, initializes and claims the interface, and 
then calls the next driver in the PCI attach chain. 
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Interrupt Service Routines 

The PCI bus uses level-sensitive, shared interrupts. P( I 
ilrivers that use interrupts use a WSIO routine to register 
their interrupt service routine with the PCI CDIO. When a 
PCI interface card asserts an interrupt, the operating sys- 
tem calls the PCI CDIO to do the initial handling. The PCI 
CDIO determines which PCI interrupt line is asserted and 
then calls each driver associated with that interrupt line. 

The PCI CDIO loops, calling drivers for an interrupt line 
until the interrupt line is deasserted. When all interrupt 
lines are deasserted, the PCI CDIO reenables interrupts 
and returns control to the operating system. To prevent 
deadlock, the PCI CDIO has a finite (although large) num- 
ber of times it can loop through an interrupt level before 
it will give up and leave the interrupt line disabled. Once 
disabled, the only way to reenable the interrupt is to re- 
boot the system. 

PCI Endian Issues 

PCI drivers need to be cognizant of endian issues.* The 
PCI bus is inherently little endian while the rest of the 
workstation hardware is big endian. This is only an issue 
with card register access when the register is accessed in 
iltiant.it ies other than a byte. Typically there are no endian 
issues associated with data payload since data payload is 
usually byte-oriented. For example, network data lends 
to be a stream of byte data. The PCI CDIO provides one 
method for handling register endian issues. Another 
method lies in the capability of some PCI interface chips 
to configure their registers to be big or little endian. 

Operating System Support Challenges 

We ran into a problem when lliirii-party card developers 
were polling their drivers to the HP I X operating system. 
Their drivers only looked at device and vendor identifiers 
and claimed the built-in LAN inappropriately. Many PCI 
interface cards use an industry-standard bus interface 
chip as a front end and therefore share the same device 
and vendor IDs. For example, several vendors use I he 
Digital 2114X family of PCI-to-10/l()() Mbils/s Ethernet 
LAN Controllers, with each vendor customizing other 
parts of the network interface card with perhaps different 
physical layer entities. It is possible that a workstation 

' little endian and big endian are conventions that define how byte addresses are as 
signed to data that is two 01 more bytes long the liitle endian convention places byies 
with lower significance at lower byte addresses, (the word is stored "hltle-end-first ") 
Ihe big endian convention places bytos with greater significance at lower byte ad- 
dresses (The word is stored "big-end-first "I 



could be configured with multiple LAN interfaces having 
the same vendor and device ID with different subsystem 
IDs controlled by separate drivers. A final driver attach- 
ment step was added to verify the driver's ow nership of 
the device. This consisted of reading the PCI subsystem 
vendor ID and subsystem ID registers in the configuration 
space. 

The 1 1P-L\X operating system does not have the ability to 
allocate contiguous physical pages of memory. Several 
PCI cards (for example. SCSI and Fibre Channel) require 
contiguous physical pages of memory for bus master task 
lists. The ('-class implementation, which allows virtual 
DMA through TLB (translation lookaside buffer) entries, 
is capable of supplying 32K bytes of contiguous memory 
space. In the case of the B-class workstation, which does 
not support virtual DMA, the team had to develop a Work- 
around that consisted of preallocating contiguous pages 
• if memory to enable this class of devices. 



Conclusion 



PCI and Interoperability. We set out to integrate PCI into 
the HP workstations. The goal was to provide our systems 
with access to a wide variety of industry-standard I/O 
cards and functionality. The deliv ery of this capability 
required the creation and verification of a bus inlerface 
ASIC and development of the appropriate software sup- 
port in firmware anil in the HP-I X operating system. 

Challenges of Interfacing with Industry Standards There 
are many advantages lo interfacing with an industry 
standard, but il also comes with many challenges. In de- 
fining and implementing an I/O bus architecture, perfor- 
mance is a primary concern. Interfacing proprietary and 
industry-standard buses and achieving acceptable perfor- 
mance is difficult. Csually the two buses are designed with 
different goals for diffcrenl systems, and determining the 
correct optimizations requires a greal deal of testing and 
redesign. 

Maintaining compliance with an industry standard is an- 
oiher major challenge. It is often like shooting at a moving 
target. If anot her vendor ships enough large volumes of a 
nonstandard feature, then thai feature becomes a de facto 
part of lite standard. It is also very difficult lo prove thai 
lite specification is met. In the end. the best verification 
lechniques for us involved simply testing the bus interface 
ASIC against as many devices as possible lo find where the 
inlerface broke down or performed poorly. 
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Finally, ii is difficult to drive development and verification 
unless the funclionalily is critical to the produc t being 
shipped. The issues found lale in I he development cycle 
for the bus interface ASIC could have heen found earlier 
if the syslem had required specific PCI I/O funclionalily 
for initial shipments. The strategy of preenabling Che 
system to be PCI compatible before a large number of 
devices became available made ii difficult to achieve the 
appropriate level of testing before the systems were 
shipped. 

Successes. The integration of PCI into the HP workstations 
through design and verification of the bus interface ASIC 
and the developmenl of the necessary software components 

has been quite successful. The goals of the PCI integration 
effort were to provide fully compatible, high-performance 
PCI capability in a cost-effective and timely manner. The 
design meets or exceeds all of these goals. The bandwidth 
available lo PCI cards is within 98 percent of the bandwidth 
available to native GSC cards. The solution was ready in 
time lo be shipped in the first PCl-enabled HP Workstations 
B132. B 1 60. CI 60. and CI 80. 

The bus-bridge ASIC and associated software have since 
been enhanced for two new uses in the second generation 
of PCI on HP workstations. The firsi enhancement pro- 
vides support for the GSC-to-PCI adapter to enable specific 



PCI functionality on HP server GSC I/O cards. The sec- 
ond is a version of the bus interface supporting GSC-2x 
(higher bandwidth GSC) and 61-bit PCI for increased 
bandwidth to PCI graphics devices on HP C200 and C240 
workstations. 
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ompulers have had a profound effect on how companies conduct 
business. They are used to run enterprise business software and lo automate 
factory-floor production. While this has been a greal benefit, the level of 
coordination between computers running unrelated application software is 
usually limited. This is because such dala transfers are difficult to implement, 
often requiring manual intervention or customized software. Until recently, 
off-t he-shelf data transfer solutions were not available. 

HP Enterprise Link is a middleware software product that increases the 
effectiveness of companies involved in manufacturing and production. It allows 
business management software running at the enterprise level, such as SAP's 
R/3 product, to exchange information (via electronic transfer) with software 
applications running on the factory floor. It also allows software applications 
running on the factory floor to exchange information with each other. 

HP Enterprise Link is available for HP 9000 computers running the HP-UX* 
operating system and PC" platforms running Microsoft's Windows NT 
operating system. 

This article will discuss the evolution of the link between business software 
systems and factoiy automation systems, and the functionality provided in HP 
Enterprise Link to enable these two environments to comnmnicate. 

Background 

Initially, only large Corporations could afford computers. They rati batch- 
oriented enterprise business software to do payroll, scheduling, and inventory. 
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As the cost of computing dropped, smaller c ompanies 
began using computers to run business software, and 
companies involved in manufacturing began using them 
in automate factory-Door production. 

Although factory-floor automation led to improved effi- 
ciency and productivity, it was usually conducted on a 
piecemeal basis. Different portions of an assembly line 
were often automated at different times and often with 
different computer equipment, depending on the capabil- 
ities of computer equipment available at the time of 
purchase. As a result, today's factory-floor computers are 
usually isolated hosts, dedicated to automating selected 
steps in production. While various factory-floor functions 
are automated, they do not necessarily communicate with 
one another. They are isolated in "islands of automation." 
To make matters worse, the development of program- 
mable logic controllers (PLCs) and other dedicated "smart" 
factory-floor devices has increased the number of isolated 
computers, making the goal of integrated factory-floor 
computation that much harder to achieve. 

While production software was generally used for smaller, 
more isolated problems, business software was used to 
solve larger company-wide problems. Furthermore, while 



production software was more real-time oriented, busi- 
ness software was more transaction and batch oriented. 
These differing needs caused business systems to evolve 
with little concern for the kind of computing found on the 
factory floor. Similarly, production systems evolved with 
little concent for the kind of computing found at the 
enterprise level. As a result, many enterprise-level business 
systems and factory-floor computers are not able to inter- 
communicate. Figure 1 shows an example of the com- 
ponents that make up a typical enterprise and factory- 
floor environment 

The net effect is that today companies find it difficult and 
expensive to integrate factory-floor systems with each 
other and with business software running at the enterprise 
level. This is unfortunate because the dynamic nature of 
the marketplace and the desire to reduce inventory levels 
have made the need for such integration very high. 

Marketplace Dynamics 

Over the last decade, the marketplace has become in- 
creasingly dynamic, forcing businesses to adapt ever more 
quic kly to changing market conditions. Computer systems 
now experience a continuous stream of modifications and 



Figure 1 

Computing at the enterprise and lactory-tloor levels. 





Enterprise Businoss Syslom 


Enterprise 
level 
















Payroll 




Scheduling 





Factory-Floor 
Level 





Component 
Pickmd-Plnce 
Station 



Wane Solder 
Stnlion 











. ' 1 


PIC Stnlion 




Mixing 
Machine 
Station 


^^^H Bottle 

^^^^H Inspector 



May 199B • The Hewlell Packard lournal 



© Copr. 1949-1998 Hewlett-Packard Co. 



upgrades. Generally, this has forced business systems to 
adopt more real-lime behaviors and production systems to 
become more flexible. It lias also increased the frequency 
and volume of data transferred between business and 
production systems and between (he many production 
systems. 

There has always been a requirement to transfer informa- 
tion between computers in an organization, both horizon- 
tally between computers at the same functional level, and 
vertically between computers at different functional lev els. 
In the past, manual data entry was an often-used approach. 
Hard-copy printouts generated by business management 
systems would be provided to operators who manually 
entered the information into one or more production 
systems. Although this was an acceptable approach in the 
past, such an approach is not sufficiently responsive in 
today's dynamic business emironment. As a result, the 
need for electronic data transfer capability between the 
various business management and production level 
computers is now very high. 

Electronic Data Transfers 

Integrated business software with built-in support for 
data transfers between components is sometimes used 
at the business management level. While this minimizes 
tlte effort required to exchange data between the various 
components of enterprise business systems, it is oil en 
inflexible and restrictive with regard to what can lie 
exchanged and when exchanges occur. 

Organizations t hat use a variety of business software 
packages, rather than a single integrated package, have 
typically developed custom software for electronic data 
transfers between packages. Unfortunately, marketplace 
dynamics require custom software to be constantly re- 
worked. This ongoing rework forces companies to either 
maintain in-house programming expertise or repeatedly 
hire software consultants to implement needed changes. 
As a result, custom data transfer software is not only ex- 
pensive to develop but also costly to maintain — especially 
if changes must be implemented on short notice. 

On the factory floor, software programmers have been 
employed to develop custom data transfer solutions that 
allow the different islands of automation to communicate. 
As previously noted, Uiis approach is difficult to implement 
and expensive to maintain. In addition, this approach is 
often inflexible since the resulting software is usually 



developed assuming (hat the configuration of fac tory- 
floor systems is largely static. 

When new equipment and application software are to be 
integrated into the overall system, software programmers 
don't just prepare additional custom software. They must 
also modify the existing custom software for all applica- 
tions involved. For this reason, custom software is often 
avoided, and electronic data transfer capability is fre- 
quently confined to transfers between equipment and 
software supplied by the same manufacturer. 

Differences in hardware (and associated operating sys- 
tems) and differences in the soft w are applications them- 
selves cause numerous application integration problems. 
Here are a few examples: 

■ Data from applications running on computers that 
have proprietary hardware architectures and operating 
Systems is often not usable on other systems. 

■ Different applications use different data types according 
to their specific needs. 

■ Incompatible data structures often result because of the 
different groupings of data elements by software applica- 
tions. For example, an element with a common logical 
definition in two applications may still be stored with 
two different physical representations. 

■ Applications written in different languages sometimes 
interpret their data values differently. For example 

C and COBOL interpret binary numeric data values 
differently. 

What is needed, therefore, is an off-the-shelf product that 
is specific-ally designed to interconnect applications that 
w r ere not originally designed to work together. That 
product must automatically, quickly, efficiently, and cost- 
effectively integrate applications having incompatible 
programming interfaces at the same or different func- 
tional levels of an organization. HP Enterprise Link is 
such a product. 

IIP Enterprise Link is an interactive point-and-click soft- 
ware product that is used to connect software applica- 
tions (such as business planning and execution systems) 
to control supervisory systems found on the factory floor. 
IIP Enterprise Link greatly reduces the cost and effort 
required to interconnect such systems while eliminating 
the need for custom soli ware. 
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The Data Transfer Problem 

The problem of transferring data from one software appli- 
cation toauothet Ls conceptually simple: jusi fetch the data 
from one system and place it in another. In practice the 
problem is more complex. The following issues arise when 
trying to implement electronic data transfer solutions: 

■ There must be a way to obtain data from the software 
application serving as the data source. Such access, for 
example, might be provided by a library- of callable C 
functions. 

■ TheK must be a way to forward data to the software 
application serving as the data destination. For example, 
data might be placed in messages that are sent to the 
destination application. 

■ There must be a specification of exactly what to fetch 
from the source application and exactly where to place 
it in the destination application. 

■ The data being transferred must be translated from 
the format provided by the data source to the format 
required by the data destination. 

■ There must he a specification of the circumstances 
under which dala should be transferred and a way to 
detect when these circumstances occur. 

All of these issues are addressed in HP Enterprise Link. 
HP Enterprise Link 

IIP Knicrprise Link product consists of the three compo- 
nents shown in Figure 2. 

■ An interactive configuration tool. This interactive 
window-based application allows users to direct the 
movement of data between two soft ware applications. 

■ A data server. This noninteractive process runs in the 
background. It moves data in accordance with the direc- 
tives that the user specified with the configuration tool. 

■ Configuration files. This is the set of mappings and 
trigger criteria created by users. The data is stored in 
configuration files. These files are created and modified 
by the configuration tool and read by the dala server. 

Linking Components 

The IIP Enterprise Link components listed above have the 
common goal of enabling users lo creale middleware thai 



Figure 2 

The components of HP Enterprise Link 
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maps components with different interfaces together for 
data transfer. 

In IIP Enterprise Link, the combination of a single source 
address and a single destination address is called a map- 
ping. A imit of data at the specified source address is said 
lo be mapped to the specified destination address, In 
ot her words, it can be read from I he specified source 
address and written to the specified destination address. 

Although a mapping deals with the transfer of a single 
unit of dala. real-world sit nations usually require the 
transfer of many units of data simultaneously. Therefore, 
HP Enterprise Link collects mappings into groups c alled 
methods. A method contains one or more mappings. 

Mappings describe whai lo transfer and where to transfer 
it, whereas triggers describe exacily when to do the 
transfer. Data is actually transferred whenever a specified 
trigger condition is satisfied. This condition is called the 
trigger criterion. There are many possible nigger criteria 
such as: 

■ Whenever a unit of data at a specified source address 
changes value 

■ Whenever a unit of data al a specified source address is 
sel to a specified value 

■ Whenever the source data becomes available — such as 
arriving in a message 

■ Al a preset time of the day or a presel day of the week. 
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IIP Enterprise Link considers trigger criteria lo be pail of 
the definition of a method. All the mappings for a single 
method share the same trigger criteria. Whenever the 
(rigger Criteria are met, HP Enterprise Link transfers — in 
unison — all the data specified by the method's mappings. 

Multiple methods can simultaneously exist in HP Enter- 
prise Link. Eur example, a user can create one method t o 
transfer a particular production recipe from a business 
enterprise system down to a factory-floor control system. 
Conversely, raw-material consumption information for 
the recipe Currently in production could be transferred 
periodically from the factory-floor control system up lo 
the business enterprise system, using a second method. 

The Configuration Tool 

The HP Enterprise Link configuration tool provides users 
with a view of each software application's name space, 
and the tool graphically depicts what data to transfer and 
under what circumstances such transfers should occur 
(Figure 3 ). 

The IIP Enterprise link configuration tool is composed 
of communication objects and a graphical user interface 
(GUI). Communication objects are used to obtain name- 
space data that is unique lo each application and to pro- 
vide application-specific windows. The configuration tool 
provides the user with an easy-to-use point-and-click style 
GUI. 



Figure 3 

The HP Enterprise Link configuration tool. 
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All dependencies on particular software applications are 
encapsulated in communication objects. The configura- 
tion tool's communication objects provide the following 
functionality: 

■ They fetch namespace information from communicating 
software applications for presentation to the user. 

" They provide routines to create and manage application 
dependent control panel widgets, such as those used 
to specify triggers unique to a particular software 
application. 

■ They provide routines to tell the GUI exactly what func- 
tionality is supported by a communication object. For 
example, can the application software serve only as a 
data source (supply data values), or can it serve as both 
a data source and a data destination (both Supply and 
use data values)? 

There are three important windows in the configuration 
tool's GUI: the Edit Method window, the Edit Mapping 
window, and the Trigger Configuration window. 

Edit Mapping. The Edit Mapping window is used to create 
new mappings (Figure 4). The namespaces of both the 
source software application and the destination software 
application are shown. They are graphically displayed 
as tree diagrams. This makes it easy for users lo specify 
which data to move where. They don'l have to remember 
the names of data sources or data destinations. Instead 
they just choose from the displayed list of possibilities. 
The side-by-side display of application namespaces makes 
it much easier to integrate the applications. 

Tret- diagrams are used because they make large name- 
spaces manageable. A linear namespace display was 
rejected early in the design of HP Enteiprise Link because 
a fiat list representation would only be manageable with 
software applications having a small namespace. Another 
advantage of tree diagrams is that most users are already 
familiar with them from file selector windows found in 
many software applications. 

To create a new mapping the user selects an item from 
the Mapping Source tree diagram and an item from the 
Mapping Destination tree diagram, and then clicks the Add 
Mapping button. A new mapping is added to the mapping 
table displayed on the Edit Method window (Figure 5). 
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Figure 4 

The Edit Mapping window. 
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Multiple sialic mappings can be created in a single step 
using branch assignments. This requires thai the las! com- 
ponent of the source ami destination addresses he identi- 
cal (so thai appropriate mappings can he automatically 
Gfeated). Mappings can also be automatically created at 
the time methods ttte triggered. This is called dynamic 
mapping anil requires the user to specify algorithms that 
can select source addresses and transform these addresses 
to valid destination addresses. 

Edit Method. The Edit Method window (Figure 5 ) displays 
a method's mappings as a two-column table tilled Map- 
pings. Source addresses appeal - in the left column and 
destination addresses appear in the right. The data server 
transfer's mapped data from source addresses to destina- 
tion addresses in the same order as the mappings are 
listed in this table The Mappings (able makes mappings 
bold explicit and intuitive to the user. 



This window allows the user to specify in which direction 
to transfer data. All of a method's mappings specify dala 
transfers in one direction — from one software application 
to another- The Edit Method window also allows I he user 
to specify how to respond to errors that occur during data 
transfers. This will be described later in more detail 

Trigger Configuration. The Trigger Configuration window 
is used to define trigger criteria (Figure ('»). This window 
displays all possible triggers to the user, as well as the 
currently configured trigger criteria. The Trigger Configura- 
tion window is designed to make selling up trigger criteria 
explicit and intuitive for the user. 

The Trigger Configuration window is split into three groups: 
time triggers, triggers unique to the source application, 
and triggers Unique to the destination application. Time 
niggers allow I he user lo specify that dala mapping star! 
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Figure 5 

The Edit Method window. 
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at some specified lime and repeal at a specified lime 
interval, but be synchronized to a specified hour/minute/ 
second of the day/hoiir/minute. 

Triggers unique to the source application, such as the 
RTAP (real-time application platform) niggers shown in 
Figure (i, allow data to lie mapped when something inter- 
esting happens in I he source application. For the RTAP 
triggers in Figure (i interesting ev ents include a database 
value change or the occurrence of an RTAP database 
alarm. Data can also be mapped when something interest- 
ing happens in the destination application. 

Thus, triggers allow data transfers to be pushed from the 
source application, pulled front the destination applica- 
tion, or scheduled by time. 

Summary. Using the windows just described, users can 
create methods with the configuration tool. These methods 
Specify one or more mappings and associated trigger 
criteria. This information is saved in one or more configu- 
ration files. The data server then reads these configuration 
files to Implement the users methods. 

The Data Server 

The IIP Enterprise Link data server is composed of com- 
munication objects, a trigger manager, and a mapping 



Figure (j 

The Trigger Configuration window. 




engine (Figure 7). Communication objects deal with the 
problems of generating triggers and getting data into and 
out of software applications. The trigger manager deals 
with dispersing Trigger Configuration data, coordinating 
trigger events, and notifying the mapping engine of trigger 
events. The mapping engine deals witb the problems of 
reading configuration files, responding to triggers, mapping 
source addresses to destination addresses, and transform- 
ing the data as it is being mapped. 

All software-application dependencies are encapsulated 
in communication objects. Communication objects serve 
;is translators between external soli ware applications and 
the tlata server's mapping engine — they translate the 
software application's native application program inter- 
face (.API J to the interface used by the mapping engine. 

The interface between a communication object and the 
mapping engine is standardized, with all communication 
objects using the same interface. For data that is being 
transferred, the interface consists solely of address-value 
pairs, where the address is from the application soft- 
ware's namespace, and the value is encoded in a neutral 
form. Thus a communication object only needs to be 
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Figure 7 

The components of the HP Enterprise Unk data server 
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aware of its own namespace and how to convert between 
the software application's proprietary data formats and 
the neutral HP Enterprise Link data formal. For triggers, 
the interface consists of well-documented interactions 
between the trigger manager and the communication 
objects. 

Communication objects are usually distributed. They are 
splii into two pans thai are interconnected by a communi- 
cation channel such as a TCP/IP socket. One part of the 
object is incorporated inlo the IIP Enterprise Link data 
server process, while the other runs on the same machine 
as the corresponding software application. When a com- 
munication object is not splil into two parts, the object, 
the data server, and the software application must run on 
the same machine. 

Communication objects communicate with their corre- 
sponding software applications through whatever mecha- 
nism is available. For example, this could be through a 
serial port, shared memory, shared tiles, TCP/IP sockets, 
or an application program interlace (API ). 

When a communication object transfers data, it translates 
data between the format used by the source software ap- 
plication and the neutral format required by the mapping 
engine. For example, for numeric values, a communica- 
tion object may have to translate between binary IEEE-7"> I 
flout ing-poinl format and the mapping engine's neutral 
format 



In practice, not all data transfer attempts will be success- 
ful. For example, a particular source address might have 
been deleted, or a destination address may no longer 
exist. The configuration tool is used to specify what the 
mapping engine should do in this situation, and the data 
server must detect the condition and deal with it appro- 
priately. When data transfer attempts fail, the user can 
have the data server do any one of the following: 

■ Continue mapping data (ignoring the error) 

■ Abort all subsequent mappings associated with the 
current method 

■ Abort all subsequent mappings and all subsequent 
methods triggered by the current trigger event (if 
multiple methods were simultaneously triggered). 

The interface between the communicalion object and 
the mapping engine is designed t o support transaction- 
oriented dala transfers, using commit and rollback. This 
functionality comes into play when mapping attempts fail. 
It allows the data server to undo ( roll back ) ail dala trans- 
fers done in all currently processed mappings associated 
with the method's current trigger event. 

The Running Data Server 

When the HP Enterprise Link data server stalls up, it reads 
the configuration files that the user created with the con- 
figuration tool, it then prepares to deal with the specified 
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trigger criteria, usually by notifying the appropriate 
communication object to detect it. Finally, it enters an 
event-driven mode, waiting for the trigger criteria of any 
configured method to lie satisfied. 

When either a source or destination communication 
object in the data server detects that a method's trigger 
criteria have been satisfied, the object informs the data 
server trigger manager that a method has been triggered. 
This starts the mapping engine. Alternatively, if the data 
server trigger manager detects that a method's time-based 
trigger criteria have been satisfied, the mapping engine 
stalls. 

When triggered, the mapping engine requests that the 
source communication object provide the current data 
values at the method's configured source addresses. The 
source communication object obtains these values from 
the software application, translates the format of all 
fetched data values to a neutral format, and passes the 
result to the mapping engine as address-value pairs, with 
one such pair for each of the method's defined mappings. 

The data server mapping engine looks up the destination 
address that corresponds to each source address. This 
lookup results in a new list of address-value pairs, with 
the address now being the destination address, and the 
value unchanged (and still expressed in the mapping 
engine's neutral format). To minimize the impact on per- 
formance, this lookup is implemented using a hash table. 

The mapping engine sends the new list of address-value 
pairs to the destination communication object. The des- 
tination communication object converts the received 
values into the format required by the destination software 
application, and writes the converted result to the speci- 
fied addresses in the destination software application. 

Communication Objects and Software Applications 

There are two fundamental ways for soft ware applications 
to provide communication objects access to their data: 
the m/tH'st-i-riili/ method and the sjioiiluni'iitix-iiK'ssdtjt' 
method. 

In the request-reply method, the communication object 
sends a software application the address of a wanted data 
unit in a request and receives its Current value in a reply. 
With this method the communication object controls the 
data transfer. It determines which unit of data to read and 
when to read it. Structured Query Language (SQL) and 



real-time databases are two examples of software applica- 
tions that employ the request-reply method. 

In the spontaneous-message method, communication ob- 
jects receive data, usually as messages, from the software 
application whenever the application chooses to send it. 
With this method the software application controls the 
data transfer. It determines which data to provide and 
when to provide it. SAP's R/3 product is an example of 
a software application using the spontaneous-message 
method. 

The method that a software application employs to provide 
external data access determines the trigger criteria that 
are possible for that application's communication object. 
The request-reply method allows event, value, and time- 
based trigger criteria since the communication object 
controls the data transfer. The spontaneous message 
method is limited to value-based triggering (essentially 
filtering) because the software application providing the 
data controls the data transfer. 

Spooling 

The IIP Enterprise Link data server's communication 
objects must cope with communication failures. This 
means that outgoing data must be locally buffered until 
a communication object verifies that the application soft- 
ware, when acting as a destination, has successfully re- 
ceived it. It also means that incoming data must either be 
safely transferred through the mapping engine or locally 
buffered when a communication object accepts data from 
the source application software. 

Spooling is especially important if the source application 
is separated from the HP Enterprise Link data server by 
a wide area network (WAN). WANs are considerably less 
reliable than local area networks, and thus are more likely 
to lose data. 

In a typical HP Enterprise Link installation the data server 
runs on a machine located near or on the factory floor. 
Production orders are downloaded from the enterprise 
level to HP Enterprise Link as soon as they are available. 
The downloaded data is buffered at the factory until it is 
needed. Using HP Enterprise Link in this way reduces the 
probability that the factory would lack unprocessed pro- 
duction orders if the WAN is down for a prolonged period 
of time. 
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Buffered data mast be preserved even if the HP Enterprise 
Link host machine is shut down or crashes To do tliis, HP 
Enterprise Link stores buffered data in disk-resident spool 
files. 

The amount uf storage used to hold buffered data must be 
restricted to protect the host computer from failure caused 
by insufficient resources. HP Enterprise Link can limit the 
size of spool files by controlling: 

■ The maximum size of spool storage 

■ The maximum number of messages buffered 

■ The age of the oldest message buffered. 

The user can set any one or all of these limits, using the 
IIP Enterprise Link configuration tool. 

Tracing 

HP Enterprise Link allows the data being transferred 
to be monitored by the user. The monitoring is called 
tinciny. Tracing is useful for creating an audit trail of the 
transferred data and for debugging and testing methods. 
Tracing does not affect the data being transferred. 

The configuration tool is used to enable and disable trac- 
ing, but it is the data server that generates trace messages 
when tracing is enabled. 

Data can be traced at a number of different internal loca- 
tions within the data server (see Figure 8 ). Some of I lie 
forms in which trace results can be expressed include: 

■ Data as received by a data server communication object 
from a source software application. This trace data is 
expressed using the source software application's native 



data format and includes the source address, the value 
received or read, and the time of transfer. 

■ Data as sent by a data server communication object to 
the destination software application. This trace data is 
expressed using the destination software application's 
native data format and includes the destination address, 
the value sent or written, and the time of transfer. 

■ Data being mapped by the mapping engine. Tliis trace 
data is expressed using the data server mapping engines 
neut ral data format and includes the source address, the 
destination address, the value transferred, and the time 
of transfer. 

Error messages reported by the mapping engine or by 
communication objects can also be included in the trace 
output. This ability ensures that the relative sequencing of 
data transfer messages and error messages is preserved, 
which greatly aids the user when trying to troubleshoot 
mapping problems. 

Server Data Flow 

HP Enterprise Link allows the flow of data in the data 
server to be interrupted at a number of different internal 
points (see Figure 9). This is useful for isolating the 
effects of data mappings during debugging and testing. 
When an information flow is interrupted, data does 
not pass the point of interruption: instead, the data is 
discarded 

The How of information being transferred from a commu- 
nication object to a software application can be Inter- 
rupted. Interrupting the How here allows the data server 
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Figure 9 
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to road from mapped source addresses, map to new des- 
tination addresses, and then discard the dat a just before 
ii would have been written to the destination software 
application. 

The flow of information being transferred from a software 
application to a communication object can also be inde- 
pendently interrupted. Interrupting the How here allows 
the data server to ignore all data sent to the communica- 
tion object by the source software application. 

Data Integrity 

The IIP Enterprise Link data server is carefully designed 
to preserve the integrity of the data being mapped and 
to map the dala exactly once for each trigger event. The 
design was influenced by considering how to react to 
communication channel failures and data server process 
terminations. The circumstances that could cause the 
data server process to terminate are the following: 

■ If a person or software process explicitly kills the data 
server process 

■ If the host machine suffers a hardware or software 
failure, loses power, or is manually turned off. 

Communication channel failures must be handled care- 
fully. If the comimmication channel connecting a commu- 
nication object to its software application fails, the data 



being mapped at the time of failure must not be lost or 
duplicated. Also, after normal operation of the communi- 
cation channel is restored, communication between the 
communication object and its application must be auto- 
matically established again and all interrupted data trans- 
fers restarted. 

The following steps are taken to ensure data integrity 
when communication channels fail: 

r For data received from the source software application, 
the communication object never acknowledges receipt 
of the data until the data has safely been saved to a 
disk-resident receive-spool file. 

■ Data received by the communication object from the 
source software application is not removed from the 
receive-spool file until the data has successfully passed 
through the mapping engine and been forwarded to the 
communication object responsible for sending it to the 
destination software application. 

■ The communication object that sends data to the des- 
tination software application only notifies the mapping 
engine that it successfully received the data after the 
data has been safely saved to a disk-resident transmit - 
spool file. Also, it only removes data from the transmit- 
spool file when the destination software application has 
acknowledged successful receipt of the data. 
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Conclusion 



The HP Enterprise Link product greatly reduces the cost 
and effort required to interconnect business management 
systems (such as SAP's Rft product ) and measurement and 
control systems (such its Hewlett-Packard's RTAP/PIus 
produc t ). HP Enterprise Link is an off-the-shelf product 
that allows users to connect software applications using 
an easy-to-use point and click graphical user interface. 

With HP Enterprise Link, companies can minimize the 
costs associated with c hanges made to computer systems 
and adapt more quickly to changing market c onditions. 
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■ http://www.tmo.hp.com/tmo/pia/Vantera/lndex/ 
English/Products.html 
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HP-UX 9 'and WO lor HP 9000 Series 700 ana 800 computers are X/Open Company UNIX 33 
branded products 

UNIX is a registered trademark la Hie United States and other countries, licensed exclusively 
through X/Open Company Limited 

X/0pen is a registered trademark and the X device is a trademark ol X/Open Company Limned 
in the UK arid other countries 

Microsoft isaUS registered trademark ol Microsoft Coiporahon. 
Windows rsaUS registered trademark of Microsoft Corporation 



I Copr. 1949-1998 Hewlett-Packard Co. 



May I99B •The Hcwlell-Packard Journal 



Knowledge Harvesting, Articulation, and 
Delivery 



Kemal A. Delic 



Dominique Lahaix 



Harnessing expert knowledge and automating this knowledge to help solve 
problems have been the goals of researchers and software practitioners since 
the early days of artificial intelligence. A tool is described that offers a 
semiautomated way for software support personnel to use the vast knowledge 
and experience of experts to provide support to customers. 



A 

JL A. consequence ol i lie global shift toward networked desktops is visible 
in customer technical support centers. Support personnel are overwhelmed 
with telephone calls from customers who are experiencing a steady increase in 
the number of problems with intricate software products on various platforms. 

Support centers are staffed with less knowledgeable (and less experienced) 
fi)5tr]in.e agents answering the simple questions and solving common problems. 
Expert (and more expensive) technicians resolve more complex problems and 
execute troubleshooting procedures. The work of both (the first-line agents 
and the technicians) is supported by Various technical tools, but they always 
have to use their brains and experience to handle effectively the stream of 
problems they encounter. This knowledge is seen as the key ingredient for the 
efficient functioning of support centers. 1 
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The number of calls and their complexity have both in- 
creased. At the same time, support solution efficiency has 
decreased as the cost for providing those solutions has 
increased. As a result, there is a need for a knowledge 
sharing solution in which the first -line agents will be able 
to solve the majority of problems and escalate to the tech- 
nicians only the complex problems. To enable such a 
solution, we have to: 

■ Find efficient knowledge extraction methods 

■ Create compact, efficient knowledge representation 
models 

■ Use extracted knowledge directly in the customer 
support operations. 

Tliis article describes the HP approach to providing cus- 
tomer support in the Windows "-Intel business segment. 
This segment includes networked desktop environments 
known for their high total cost of ownership. Help-desk 
services for this segment are supposed to solve the major- 
ity of problems with software applications, local area net- 
works, and interconnections. 

The system described here, called Wise Ware," is a knowl- 
edge harvesting and delivery system specifically designed 
to provide partially automated help for HP customer sup- 
port centers in their problem solving chores. 

Partial automation of help-desk support is seen as a suit- 
able, cost-effective solution that will: 

■ Shorten the lime spent per call 

■ Decrease the number of incoming calls (because of 
proactive mechanisms) 

■ Decrease the number of calls forwarded to the next 
support level 

■ Dec rease the overall labor costs. 

The objective is to reduce dramatic-ally the support costs 
per seat per year. 

Where Is Knowledge? 

To find the most efficient knowledge extraction methods, 
we must first answer the question. "Where is the knowl- 
edge?" Books, technical articles, journals, technical notes, 
reports, and product doc umentation are all classical 
resources that rely on a human being's ability to extract. 

' WistrWare is an internal tool and not an HP product 



evaluate, and apply knowledge. Mechanized efforts still 
c an't replace these human attributes. 

Current support solutions usually are based on electronic- 
collections in a free-text format, in which the important 
concepts are expressed using natural human language. 
The latest release of Wise Ware uses technical notes, fre- 
quently asked questions, help files, call log extracts, and 
user submissions as die primary raw material. According 
to the knowledge resource, different knowledge represen- 
tations and extraction methods are used. 

Extensive research in the field of artific ial intelligence has 
created several knowledge representation and extraction 
paradigms in which the final use for knowledge determines 
the characteristics of the representation scheme. The ear- 
liest knowledge extraction efforts, known as iltfOTTtiation 
ivtrieral, initially had small industrial impact. However, 
recent interest in the Internet and in electronic book 
collections has revived the business interest in information 
retrieval. Some of the hottest produc ts on the market today 
are search engines. Different search methods (by key- 
words or by concepts) are being used and other search 
methods (by examples and by natural language phrases) 
are being investigated. Recent synergy with artific ial intel- 
ligence methods has created a promising sublield known 
as intelligent information retrieval. 2 The majority of today's 
customer support solutions can be classified as enriched 
information retrieval systems. 

Electronic Document Libraries 

Developments in the information retrieval field have trans- 
formed free-texl collections into more refined collections 
known as electronic document libraries. Electronic docu- 
ment libraries have an articulated structure (author, sub- 
ject, abstract, and keywords), enabling efficient searches 
and classification. They combine advanced technological 
methods (such as hypertext and multimedia) to fit users' 
information retrieval needs. Some of the best support 
solutions today are in a digital library class and represent 
sophisticated document management systems. 

Case-Based Retrieval 

Early hardware support documentation contained trouble- 
shooting diagrams that made it possible for service tech- 
nicians to iroubleshool equipment consistently by follow- 
ing the diagrams and performing the appropriate tests and 
measurements. The recent revival of these diagrams is 
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Glossary 



Cluster. Natural association of similar concepts, words, and 
things. 

Concept. Group of words conveying semantic content. It can be 
described graphically as relationships between words having 
different attributes (and in some cases as numerical measure- 
ments of strength). 

Data Mining. Collective name for the field of research dealing 
with data analysis in large data depositories It includes statis- 
tics, machine learning, clustering, classification, visualization, 
inductive learning, rule discovery, neural networks, Bayesian 
statistics, and Bayesian belief networks. 

Information Retrieval. Identification of documents or infor- 
mation from the collection that is relevant for the particular 
information need. 

Keyword. Characteristic word that may enable efficient re- 
trieval of relevant documents. Two criteria used to assess the 
value of a keyword are the number of documents retrieved and 
the number of useful documents (recall and precision) 

Knowledge. Group of interrelated concepts used to describe 
a certain domain of interest. Complex structures formed by 
emulating human behavior in certain activities (for example, 
assessment, problem solving, diagnosing, reasoning, and in- 
ducing). Different schemes are used to enable knowledge 
representation such as rules, conceptual graphs, probability 



networks, and decision trees. Knowledge is found in large text 
collections and is biologically resident in human brains. 

Knowledge Map. Graphical display of interrelated concepts. 

Knowledge Base. Complex entity typically containing a 
database, application programs, search and retrieval engines, 
multimedia tools, expert system knowledge, question and 
answer systems, decision trees, case databases, probability 
models, causal models, and other resources. 

Metrics. Group of measurement methods and techniques 
introduced to enable quantification of processes, tools, and 
products 

Natural Language Processing. Activity related to concept 
extraction from, formalization of. and methods deployment in 
a problem area. 

Paradigm. A theoretical framework of a discipline within 
which theories, generalizations, and supporting experiments 
are formulated. 

Problem Domain. Area of interest defined by terminology, 
concepts, and related knowledge. 

Search. Activity guided by a find and match cycle in which a 
search space is usually explored with an appropriate choice of 
search words (keywords). Advanced search is done by concepts. 



seen in interactive troubleshooting systems that enable PC 
hardware technicians to solve hardware problems. So Car, 
such systems are implement ed as case-based rel rieval ( or 
reasoning) systems. The majority Of these systems provide 
only retrieval: just a few include the reasoning component 
The case-based retrieval paradigm is based on the human 
ability lo solve problems by remembering previously 
solved problems. The support system plays the role of an 
electronic case database in which the knowledge consists 
of documented experience (cases). Creation and mainte- 
nance of the cases is an expensive and nontiivial process. 
Currently, these activities are performed by humans and 
are used mainly for hardware support. Such systems 
cannot deal efficiently with large, complex, and dynamic 
problem areas. 



Rule-Based Systems 

Some support centers have tried to use expert systems 
based on rules, but they have discovered that the rule- 
based systems are difficult to create, maintain, and 
keep consistent. Crafting a collection of rules is a com- 
plex chore. It is not clear if this technology will have a 
role in future knowledge representation and extraction 
development. 

Model-Based Systems 

A model-based paradigm in which various statistical, 
causal, probability, and behavioral models are used is 
another example of knowledge representation for cus- 
tomer support. The knowledge here is expressed by the 
fault/failure model that contains quantified relationships 
between causes, symptoms, and consequences. Basic- 
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dec ision making is enabled with such models. Although 
some limited experiments with this highly sophisticated 
knowledge representation paradigm have been done, no 
system is in operational use in support centers. 

New Research 

The newest research in the field of data mining and know- 
ledge discovery' may offer some potentially effective 
knowledge representation methods for deployment in 
customer support centers. This research aims at the 
extraction of previously unknown patterns (insights) from 
the existing data repositories. Research in artificial intelli- 
gence has identified the initial assembly of a low-cost 
knowledge base as a potential "engineering bottleneck." 
The knowledge authoring environment discussion later 
in this article addresses that issue. Because most of the 
knowledge for WiseWare comes from text sources, we 
will focus our attention here on the knowledge extraction 
process. 

WiseWare and Knowledge Refinement 

Knowledge is a Quid, hard-to-define but essential ingre- 
dient for all human intellectual activities. It is difficult to 
extract, articulate, and deploy. The prevailing quantity of 
knowledge is encoded in the form of text (90 percent) 
expressed in natural language and is articulated as a web 
of interrelated concepts. A goal of research in natural lan- 
guage is to enable automatic and semiautomatic extrac- 
tion of knowledge. Content analysis must be automated to 
efficiently provide suggestions and solutions for users. As 
we have already seen, several knowledge representation 
paradigms are being invented and investigated (for 
example, semantic nets, rules, cases, and decision trees). 
Additionally, we can deploy various techniques to extract 
concepts (symbolic knowledge) and numerical quantities 
(numerical and statistical knowledge). 

Refinement Process 

Human experts Use spreadsheets, outline processors, and 
some vendor-specific tools to refine source text, but have 
not yet developed .systematic, efficient processing methods. 
In the future, we would like to automate some phases of 
this process, leading toward more efficient and effective 
deployment. 

Knowledge refinement is seen as a process for converting 
raw text into coherent, compact, and effective knowledge 
forms suitable for software problem solving and assistance 



(for example, decision trees, rules, probability models, 
and semantic nets). The basic raw material (the knowledge 
in its primary form ) remains accessible. Tlus preserves 
previous investments in knowledge and enables integra- 
tion into future, more sophisticated solutions. 

We can describe the knowledge refinement process as 
efforts made to transform raw text to a compact represen- 
tation and then to operational knowledge. Associated 
costs increase as raw text moves through the refinement 
process to become operational knowledge. 

C u rrent ly WiseWare content is partitioned into three con- 
ceptual categories: fixes, step notes, and technical notes. 
The first two contain shallow, specific knowledge and the 
third contains complex technical concepts. A fix is a sun- 
pie, short document that describes with fewer than 100 
words a known and recurring problem with a known 
solution, the fix (see Figure la). A fix often helps the 
customer out of the immediate problem but does not pro- 
vide a long-term solution. It is essentially a "quick fix." 

A step note usually walks the user through a procedure 
that prevents the problem from occurring in the future 
(see Figure lb). The step note requires more of the users 
time to solve the immediate problem than the fix does, 
but it saves time in the future. 

Both fixes and step notes offer additional references. 
Those references c ontain keywords providing links to 
technical notes that explain the most relevant related 
subjects In depth. Technical notes require deep technical 
knowledge to be properly understood and applied. 

The whole collection of fixes, step notes, and technical 
notes is tagged to associate the content of each document 
with the proper problem classes. Consequently, WiseWare 
content is perceived by the user as a repository of advice 
and solutions forgiven problems (quick fixes, slep-by-step 
procedures, and technical theory). 

Some generic activities in the refinement process can be 
denoted as: 

■ Assessment 

■ Extraction 

■ Filtering 

■ Summarization 

■ Clustering 

■ Classification. 
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Figure 1 

Two WiseWare screens: fa) WiseWare fix screen, fb) WiseWare step note screen. 
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TO CLEAR THE CMOS MEMORY: 




• 1 Switch off the PC and remove the cover 

• 2. Set the system board switch 6 (CMOS STATUS) on the switch block W 
CLOSED co dear the configuration 

• 3 Switch on the PC to erase the CMOS memory 

• 4 Wait until the PC has started The screen will flash with the message 
"Configuration has been cleared, set switch 6 to the OPEN position betore 
rebooting " 

• S. Switch off the PC 

• 6 Set the system board switch 6 (CMOS STATUS) on the switch block to 
OPEN to re- enable the configuration 

• 7 Replace the cover 

• 8 Switch on the PC An error message will be displayed "System CMOS 
checksum bad - run SETUP" The PC will stop 

• 9 Run Setup by pressing F2 CMOS default values will be automatically 
downloaded and saved 

• 10. Make any other changes you want and press F3 to save the configuration 
and exit from Setup 
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We c an describe the evolution of WiseWare as going from 
answering questions to giving advice and finally to problem 
solving and troubleshooting. The support costs in this 
evolution have escalated as the problems have become 
more complex. 

Knowledge Authoring Environment 

Since a critical mass of knowledge can be reached only 
if multiple authors contribute to the knowledge base, the 
knowledge authoring environment must be able to deal 
with ituilliaulhor issues effectively. Additionally, because 
the knowledge authoring environment is deployed on a 
worldwide basis, the issue of different languages is rele- 
vant as well. Finally, deployment in different time zones 
requires very high reliability and availabiliiy of the knowl- 
edge authoring environment. 

The quality of the knowledge is constantly monitored and 
refined. Areas for improvement are pinpointed by analyz- 
ing results reported on the knowledge base logs. As weak 
points are identified and si l engthened, better system 
performance will help to optimize return on investment 
figures. Even user satisfaction can be assessed from the 
various logs and usage traces that will reflect a combined 
measure of system quality and usefulness. 

Future worldwide cooperation among support centers 
to share knowledge is our objective. Ideally, each center 
will deploy and create the necessary knowledge locally. 
Centers operate in different time zones, have different 
cultural and social contexts, and have the ability to manip- 
ulate huge amounts of data, information, and knowledge. 
Coordinating the knowledge bases for all support centers 
pose several challenging problems. The complexity of 
these problems is reduced by careful engineering and 
incremental deployment. The result is a low-cost, knowl- 
edge-based support, adding new value to the support 
business. 

In a very advanced situation, and from a long-term per- 
spective, extracted knowledge will become the crucial 
ingredient for the next development phase. In this phase, 
human mediation in problem solving could be removed. 
Support could be delivered electronically without human 
intervention, For example, imagine intelligent agents trav- 
eling over the network to the troubled system to fix 
a problem. 1 ( iirrent viruses on the Internet are doing 
exactly the opposite task. What if the trend were reversed? 



Support knowledge could be adapted so that healing vi- 
ruses could travel tlirough a system, delivering remote 
fixes. To understand how thus could become a reality, lets 
review the history of WiseWare. 

WiseWare Architecture 

In November of 1995. the first challenge was posed to the 
WiseWare team when the French call center decided to 
outsource low-end software support services. Their sup- 
port personnel were without computer technology back- 
ground and demonstrated poor English language skills. 
The knowledge department in HP*s Software Services 
Division in Europe responded to the challenge and deliv- 
ered the first operational WiseWare solution in April of 
1996. Since then, new releases are issued every two 
months with steady improvements. 

In the WiseWare release 4.1. mirroring intranet servers 
(Europe and the United States) cover three super regions. 
The number and quality of accessible documents is 
constantly improved, while use of the system is closely 
monitored from access and search logs. We have estab- 
lished close links with software vendors who allow us 
privileged access to their documents. (The legal frame- 
work for cooperation and alliances is defined as well. ) All 
activities and services undergo quality assurance scrutiny 
in preparation for ISO-9000 certification. 

WiseWare provides approximately 80,000 documents to 13 
call centers worldwide. The average problem resolution 
assistance rate is over 30 percent. More than 40 products 
are cov ered in the v arious types of documents offering 
quick fixes for agents and in-depth technical knowledge 

for advanced WiseWare users. 

WiseWare is a distributed system with three major parts: 
production, publishing, and monitoring (see Figure 2). 
They are implemented on UNIX ' and Windows NT plat- 
forms, with intranet technology providing the necessary 
glue for client/server solutions. It is a nonstop, highly 
available system. The key adv antage of the WiseWare 
system lies in the tight loop between the monitoring and 
production areas in which the principal objective is to 
provide users with highly adaptable documents for every- 
day problem-solving chores. Data mining and natural 
language processing modules dynamically create user, 
problem, and document profiles that will then drive the 
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Figure 2 

WiseWare system architecture. 




DM-Lib = Data Mining Library 

NLP-Lib = Natural Language Processing Library 
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production side, enabling technical and business insights 
l.o be gleaned from large and extensive access and searc h 
logs. 

A1 l his time, customers call The express hubs and explain 
their problems to support personnel, using natural lan- 
guage constructs that sometimes blur the real nature of 
the prohlem. According to their understanding, support 
personnel create and launch a search phrase. It is a 
Boolean construct containing relevant keywords or free- 
text phrases that roughly represent the problem. Different 
search, hit, and presentation strategies are currently used, 
but formation of the effective search query and reduction 
of the number of relevant replies are largely still unre- 
solved. A mixture of artificial intelligence techniques and 
traditional information retrieval and dal abase methods is 
being offered as potential solutions. 

Table I shows how one, two. and three words in a typical 
search phrase can influence the number of relevant docu- 
ments returned with eurrenl version of WiseWare. A well- 
formed phrase helps to quickly pinpoint relevant docu- 
ments while retaining necessary coverage of the problem 
area. Notice the quick decrease in the number of relevant 
documents returned as the phrase becomes longer. 



Support center personnel work under Lime-pressured, 
stressful circumstances. As a result, the whole hunian- 
computer interaction issue must be carefully considered. 
Efficiently delivering advice and problem-solving assis- 
tance can depend on the smallest detail. Besides the 
quality of the material in the supporting knowledge base, 
questions regarding query formulation and presentation of 
the retrieved information will influence final acceptance 
from the users. Support acthities can be treated as sym- 
biotic human-machine problem solving in a bidirectional 
learning paradigm. The user learns how to manipulate the 
system ( facilitated by language features such as localiza- 
tion and query wizards). At the same time, the system 
adapts to the user's methods of accessing the knowledge 
base. The WiseWare system learns user behavior from 
access and language patterns. Interaction with the system 
customizes the environment to suit the specific user's 
profile. The reasoning activity is si ill done by humans and 
is supported by refined electronic collections. Good syn- 
ergy and efficient funclioning of such human-computer 
systems are the eurrenl objeciives. 

Because the support centers are located in different geo- 
graphical, cultural, and language areas, the natural lan- 
guage layer is seen as crucial for search and presentation. 
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Table 1 
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Technological advances in visual search and delivery 
combined with audio and video techniques may improve 
the quality and efficiency of the system. Better architec- 
ture combined with object-oriented (multimedia) data- 
bases will add another dimension to the delivery phase. 
These improvements will be made over tune and will be 
accelerated by teclmological developments in related 
fields. 



Conclusion 



Accessible knowledge is the essential ingredient for suc- 
cessfully dealing with the rising quantity and complexity 
of customer support calls. A semiautomated system with 
refined knowledge in reusable forms can enable users to 
share knowledge among different, geographically dis- 
persed customer support centers. The overall objective 
of HP's WiseWare server is to provide low-cosl. effective 
customer support. This is a simple objective but one that 
is difficult (6 achieve, especially when significant effort 
and investment are required to achieve technological 
breakthroughs in the problem-solving field.' 1 

In the short term, incremental deployment of advanced 
methods such as data mining and natural language pro- 
cessing techniques will improve system quality and usage. 
In the long run, it is very likely thai most of the client-hub 
telephone voice conutuuiication will be gradually replaced 



by computer-computer communication. Sev eral layers of 
the present problem- solving architecture will disappear 
or will be replaced by some new elements. The problem- 
solving knowledge along with search and access log 
collections being developed now will serve as the funda- 
mental basis for future electronic support. 
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A Theoretical Derivation of Relationships 
between Forecast Errors 



Jerry Z.Shan 



This paper studies errors in forecasting the demand for a component used by 
several products. Because data for the component demand (both actual demand 
and forecast demand) at the aggregate product level is easier to obtain than at 
the individual product level, the study focuses on the theoretical relationships 
between forecast errors at these two levels. 
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'itli a sound theoretical foundation for understanding forecast errors, a 
much more confident job can be done in forecasting and in related planning 
work, even under uncertain business conditions. 1 

In a typical material planning process, planners are constantly challenged by 
forecast inaccuracies or errors. For example, should a component forecast 
error be measured for each platform for which it may be needed, or should its 
forecast accuracy be measured at the aggregate level, across platforms? What 
is the relation between the two accuracy measures? 

Tins paper describes a theoretical st udy of forecast errors. First , we formally 
define forecast errors with different rationales, derive several relationships 
among them, and prove a heuristic formula proposed by Mark Sower.' Then 
we study the effects of a systematic bias on the forecast errors. Finally, we 
extend our study to the situations where correlations across product demands 
and lime effects in demand and forecast are taken into account Definitions 
and theorems are presented first, and proofs of the theorems are given at the 
end of the paper. 

Basic Concepts 

Consider the case of a component that can be used for the manufacture of n 
different products, or pltiljoniis. For plal form i (1 < i < n ), denote by F the 
forecast demand for the component, and by Dj the actual demand. In the 
treatment of forecast and actual, we propose in this paper the following 
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framework: Regard forecast demand as deterministic, or 
predetermined, and actual demand as stochastic. By 
stochastic, we mean that given the same operating envi- 
ronment or experimental conditions, the actual demand 
can be different from one operation run to another. Thus, 
we can postulate a probability distribution for it. 

For a generic case, denote by D the actual demand and by 
F the forecast. We call the forecast unbiased if E(D ) - F, 
where E( D ) denotes the expectation, or expected value, 
of D with respect to its probability distribution. Practically 
speaking, this unbiased requirement means that over many 
runs under the same operating conditions, the average of 
the realized demand is the same as the forecast. If there is 
a deterministic quantity b 0 such that E(D) = F + b, then 
we say the forecast is biased, and the bias is b. hi prac- 
tice, this means that there is a systematic departure of the 
average realized demand from the forecast. 

Throughout the paper, we often make the normality as- 
sumption on the demand, that is. for unbiased forecasts, 
we assume that the demand D has a normal (Gaussian) 
distribution with mean Fand standard deviation ti, that is, 
D~N(F. a-). Is this a reasonable assumption in reality? 
The answer is yes. First of all, this assumption is techni- 
cally equivalent to assuming that the difference e = D — F 
between the actual demand D and the forecast F is nor- 
mally distributed: i - N(0, a-). The validity of this latter 
assumption is based on the fad thai the difference be- 
tween the actual demand and a good forecast is some ag- 
gregation of many small random errors, and on lite central 
limit theorem, which stales that the aggregation of many 
small random errors has a limiting normal distribution. 

Unbiased Forecast Case 

In this section, we assume unbiased forecasting at all 
platforms. 1 Statistically, E(Dj) = F|, where F, is the fore- 
cast for the common component at platform i. and Dj is 
the actual demand of the component at platform L 

Definition 1: (Same Weight Mean Based) Define E„ = E(e„) 
to be the forecast error at the mean (average) platform 
level, and E a - E(e„) to be the forecast error at the aggre- 
gale platform level, where: 



and 



-if** 

i = i 



Pi 



(la) 



i ■« ii 



(lb) 



2*i 

1 = 1 



The rationale of defining the forecast error at the 
mean level and at the aggregate level is as follows. Let 
e, = ID* — Fjl/F,. Then e, measures, in terms of the relative 
difference, the forecast error at a single platform i. 
Accordingly, f ; , measures the forecast error, also in terms 
i if the relative difference, at the aggregate level from all 
platforms, and e^ provides an estimate for the forecast 
error at any individual platform since it is the average of 
the forecast errors over all individual plat tonus. Because 
all the quantities in equation 1 are stochastic, we take 
expectations to get their deterministic means. Now, a 
natural question is: What is the relation between the 
errors at the two different levels? 

Theorem 1: Based on definition 1. and assuming that 
Dj ~ N(Fj. a 2 ), i = 1, 2, .... n, and that the D, are uncorre- 
lated (strictly speaking, we also need the joint normality 
assumption, which in general can be satisfied ), we have: 



1. K n = - nE a C„, where: 



i=i F . 



(2) 



i= 1 



2. It is always true thai ('„> 1, and ('„ = 1 if and only if the 
forecasts across all the platforms are I he same. 

We note that in the definition for E n . we used the same 
weight, 1/n, for all platforms. If instead we use a weight 
proportional to the forecast at the platform, then we have 
the following: 

Definition 2: (Weighted Mean Based) Define E^=E(e x ) 
and E a = E(e.,|, where: 



= V 



i?i Pi - Pj _ m 



S D, - F, 



E*i 

J = i 



u 

i i 



(3a) 



and 
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i=l i=l 



1*5 

1 = 1 

Theorem 2: Based on definition 2 and with the same 
assumptions as in theorem 1, we have: 

E n = ><nE a . 



(3b) 



(4) 



Mark Sower' proposed this heuristic formula. Theorem 2 
says that under suitable conditions, equation 4 holds 
exactly. 

( )rher researchers have addressed a similar problem from 
the perspective of demand variability. In measuring the 
relative errors of the forecast at the individual platforms, 
it was assumed that oj/uj (i = 1, 2, n) are the same, 
where Oj is a measure of demand variability and U] is the 
mean demand al platform i. The advantage here is we do 
not need to make such a strong assumption. In fact, our 
measure of the forecast error at the individual platform 
level can be interpreted as the forecast error at an aver- 
aged individual platform. 

The following definition of error is based on this observa- 
tion in practice. The Standard deviation of a random vari- 
able can be very large if the values this random variable 
takes on are very lar ge. A more sensible error measure of 
such a random variable would be the relative error rather 
than the absolute error. So. given a random variable X. 
we can measure its error by the coefficient of variance 
cv(X) = o(X)/E(X ) rather than by its standard deviation 
o(X). 

With the unbiased forecast assumption, the forecast error 
at platform i can be measured by ev(Dj). The average of 
these coefficients over all platforms is a good measure of 
the forecast error at the individual platform level. On the 
n 

other hand. > D, is the demand from all platforms, and 

i = l 

V \ 
> F| is the corresponding forecast, so cvl Dj I is a 

i=l \i=l / 

good measure of the forecast error at the aggregated plat- 
form level. 



Definition 3: (CV Basedl Define: 

B» = J cvO^/n and E a = cv( f D, J. 



i=i 



,i = l 



Theorem 3: Based on definition 3, and assuming that the 
Dj are uncorrected, we have 



E.T - v iiE a C,i 



(5) 



where C„ is defined in equation 2. For theorem 3, we do 
not have to assume normality to get the relevant results. 
This is also true for theorem 4. 

General Case: The Effect of Bias 

We assume here that forecasts are consistently biased. 
This is expressed as E(Dj) = Fj + b, where b denotes the 
common forecast bias. This indicates that Fj overesti- 
mates demand when b < 0 and underestimates demand 
when b > 0. 

Can we extend the use of definition 3 for the forecast 
errors to this general case? The answer is no. Tliis is be- 
cause the standard deviation is independent of bias, and 
therefore one could erroneously conclude that the fore- 
cast error is small when the Standard deviation is small, 
even though the bias b is very significant. Instead, the 
forecast error now should be measured by the functional: 



e(D.F) = V 'E([D - 



(6) 



rather than by the cv, which is v E([D - E(D)] ) /E(D). 
Hence, in parallel with definition 3, we have the following 
definition. 

Definition 4: ( e-Functional Based) Define: 

E,. = e( V Dj, J Fj ) and E„ = V e(Dj,F,)/n, 
\i = l i=l / i=l 

where the functional e is defined in equation 6. 

If the bias b = 0, then the functional e in equation 6 is 
the same as the cv, and hence definitions 3 and 4 are 
equivalent. 
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Theorem 4: Based on definition 4. and assuming tiiai 
Dj — (Fj + b. o 2 ).* i = 1. 2. .... n and that the D, are uncor- 
related, we then have; 



E, = ,nE a C 



nb 2 ' 



(7) 



where C„ is given in equation 2. 

Since definition 1 considers the relative difference be- 
tween the forecast and the actual, any bias in the forecast 
will be retained in the difference, so there is no problem 
in using this definition even if there is bias. However, the 
relation between the two errors has changed. 

Theorem 5: Based on definition 1 and the assumption that 
Dj ~ N(Fj + b, o 2 ), i = 1.2, .... n and that the Dj are uncor- 
rected, we have: 

v /Ioe-bW + b [2»(b/o) - 1] 

tin - ,nhat-n-= — — , (o) 

jZae-vm + ,nb[2*(vnb/o) - 1] 

where C„ is defined in equation 2. and <l>(x) is the cumula- 
tive distribution funct ion of the standard normal distribu- 
tion N(0, 1) at x. 

If there is no bias in the forecasting, the relationships be- 
tween the errors at the two levels are exactly the same for 
definitions 1 and 3: E„ = v nE a C„. This formula, with I he 
introduction of the constant (',„ is slightly different from 
the hypothesized equation 4. As noted in theorem 1, it Is 
always true that C„> 1. If we use definition 2, then equa- 
tion 4 holds exactly. 

If there is bias in the forecasting, then in each relationship 
formula (equation 7 or equation 8), there is another multi- 
plying factor that reflects the effect of the bias. One can 
easily find that both of these multiplying factors are less 
than or equal to 1. This implies that, compared to the 
error at the c omponent level, the error at the platform- 
component level when forecast bias exists is less than 
when the forecast bias docs not exist. 

If bias does exist, as it does in reality, it seems that the 
multiplying factor resulting from bias in either equation 7 
or equation S should be taken into consideration, with 
suitable estimation of the parameters involved. 

The notation X — fit. a 2 ) means tliat X has mean n and standard deviation a but is not 
necessarily normally distributed 



Correlated Demands 

It is reasonable to assume that demand for a component 
for one platform affects demand for this component for 
another platform. Also, for a given platform, there is usu- 
ally a strong correlation between the current demand and 
the lustorical demands. The forecast is usually made 
based on the lustorical demands. In this section, we first 
propose a correlated multivariate normal distribution 
model for the demand stream when the platform is 
indexed, and then propose a time-series model for the 
demand and forecast streams when time is indexed. Our 
goal is to expand our study of the relationship between 
the two layers of forecast errors in the presence of cor- 
relations. Throughout this section, we assume unbiased 
forecasts, and use the weighted average definition (defini- 
tion 3) for the forecast error. 

Correlated Normal Distribution Model at a Time Point. In 

this subsection we consider the case where there is cor- 
relation across platform demands, but we still assume 
that time does not affect demand. Suppose that the de- 
mand stream Dj, i = 1, 2 n can be modeled by a corre- 
lated normal distribution suc h that Dj — N(Fj, o 2 ) for i = 1, 
2. n and that there is a correlation between different Dj 
expressed as Cov(Dj. Dj) = o 2 py for lsi^jsn. With this 
assumption on the demand stream, we have the following 
result. 

Theorem 6: Based on definition 2 and the above corre- 
lated normal distribution modeling for the demand 
stream, we have: 




E,r = 



In particular, if pi_, = p for all 1 < i * j < n, then we get: 

M 



(9) 



E n = 



/(n - Dp + 1 



:E a . 



(10) 



When the common correlation coefficient p is 0 or near 0. 
we see that equation 4 holds exactly or approximately. 

Autoreyressive Time Series Model. Now we lake into 
consideration the time effect in the product demand. For 

platform i, i = 1, 2 n at time t, t = 1, 2 denote by Y)\'' 



o 
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the demand and F[' 'the forecast. Suppose thai the de- 
mand stream over time al each platform can be modeled 
by an antoregressive model AR(p). At platform i. the auto- 
regressive model assumes that the demand al the current 
time I is a linear function of the pasl demands plus a ran- 
dom disturbance, that is: 

d;» = j^ygf-*+«f, 
j=i 

where the a* j are constant coefficients. Further, suppose 
lhat the forecast F|" is optima] given the historical de- 
mand profile Jj 1- " = o(D} u , Dp', .... Dl'-'^. That is, 

with D[ (,) . D|-" D\- ( P~ 1)1 properly initialized, for 

t>l: 



D['> = a u D?-l> 



, n»-P) + e <i) 



and 

F<" = EfDC'lvF!'- 1 ' 



= a u D('- 1) + a i 2D[-- 2 ) + --- + a iip D['- p '. 

where e[", e[ 2 ' e[", ... are independently and iden- 
tically distributed as N(0, <? 2 ) and the random disturbance 
at time t. that is, e!", is independent of the demand stream 
before time t, that is, |D! ,-1) , D! i_2) , ...). Also, we 

assume independence across platforms. With the above 
modeling of t he demand and forecast , w hat can we say 
about I lie relationship between the two layers of forecast 
errors? 

Theorem 7: Based on definition 1 or 2 and the above lime- 
series modeling for the demand stream and forecast 
stream, and assuming lhat the variances at all platforms 
are the same, then at any time point, if definition 1 is 
used: 



E»> = ,nE^C n , 
where 



(Hi 



I II 

E nXp) 

i = l r i 



C„ = 



I'- 



ii 

Vf|" 

i= I 



and if definition 2 is used, then: 

E<" = ,nE< l >. 
Rewriting C„ in equation 2 as 



C12D 



C„ = 



I V_l_ 

i=l r i 
n 

i = l 



and taking expectations for the numerator and denomina- 
lor separately in the expression leads lo C n . Hence, it is 
always (rue that C n ^ 1. 

Proofs 

Theorem 1 is a special case of theorem 5. Theorem is a 
s| tecial case of theorem 4. The proof for theorem (> is simi- 
lar to that for theorem 5, with an application of lemma 1. 

Lemma 1: If X~N(b, o~). (hen: 

EIXI = ^ae" 1,2/21,2 + b[2*(b/o) - 1] = H(b,o). (13) 

Proof of Lemma i: Without loss of generality, we can 
assume lhal o = 1, since otherwise we can make a simple 
transformation Y = XAj. 



EIXI = 



«2;T J 



l x | e -<x-b^/2 dx 



X X 

= 4= f lxle-<*- b > 2 / 2 d X + 4= [ lyle-<* +b > 2 / 2 dv 
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= Kb) + I( - b). where 



1(b) 



= -J=f xc-«* 
.2,-t J 



= 4= f (y + b)e-y 2 / 2 dy 
. zn J 

-b 



= -J=e -b2/2 + b4>(b), and hence 
>2n 

EIX! = -i=e~W 2 + b*(b) + (- b)*(- b) 
,2* 



= /2 p -l>!/2 
71 



! + b[2<I)(b) - 1). 



Proof of Theorem 1 Parts 2 and 3. First HOte thai flinc- 
tion cp( x ) = 1/x is convex over (0, =c ). Lei random variable 
X have a uniform distribution on the set |F,: 1 < i < n|, thai 
is, P(X = Fj) = 1/n. An application of the Jensen inequality- 
E(p(X) > <p(EX) leads to the desired inequality. The second 
part is based on the condition for the Jensen inequality to 
become an equality. 

Proof of Theorem 4: 



e(Di.Fj) = 



Pi 



F, 



B, « e{ y i >, x F = 



/n©2 + (nb) 2 



,i=l i = l 



i=i 



Hence w e have: 



Proof of Theorem 5: Noting that: 

n 

Dj - F, ~ Nfb. o-) and V(D, - F ( ) - N(nb. no 2 ), 
i = i 

I hen we have: 



E 3 



E(e.-x) 
E(e a ) 



n 



H(b. o) 



J_ H(nb. ,no) 



(by lemma 1) 



If. 

1=1 



/|oe- b2 ' 202 + b[21>(b/o) - 1] 



/|qp- b ^ + ;nb|2fl>( 1 nb/o) - 1] 

Proof of Theorem 7: The proofs for equations 1 1 and 1'2 
are similar. We give a proof for equation 1 1 only. First 

notice that Dj 1 ' - Fj 1 1 = eJ' 1 - N(0. or ). Al any given 
time I, by the definitions for El" and E<". we have: 



= iSE(K"'|)E 



i = l 



n 




i = l \ I 



This second step follows from the facl thai e!" is Inde- 
pendent of demands before time I. and hence Independent 
of the optimal forecasi at time i. Fj 1 '. The last step follows 
from lemma 1 and the same variance assumption across 
platforms. 



E<,» = E 



i) 

Ye!" 



i=i 



Jf'" 

L i = l 



= E 



I l 



E 



n 

y f" : 
t»i 
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V f!" 



i = l 



/no J^oE 



1=1 



The reasoning is the same as for proving E^'* above. 



Conclusion 



Forecast errors inc rease the complexity and difficulty of 
the production planning process. This results in excessive 
inventory costs and reduces on-time delivery. In (his paper 
we have studied the forecast errors for the case of several 
products using the same component. Because data for the 
component demand (both actual demand and forecast 
demand) is easier to obtain at the aggregate product level 
than at the individual product level, we focused on the 
theoretical relationships between forecast errors at these 
two levels. 

Our first task was to propose formal definitions for mea- 
suring forecast errors under different rationales and tech- 
nical assumptions. The second task was to formally derive 



relationships between forecast errors at the two levels. As 
part of our work we proved the validity of a heuristic for- 
mula proposed by Mark Sower of the business operat ions 
planning department at the HP Roseville, California site. 

In addition to analyzing the two-level problem, we derived 
a theoretical basis for relaxing the usual assumptions con- 
cerning correlations in Ihe data across products and over 
time. 
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Strengthening Software Quality Assurance 



Mulsuhiko Asada 



Porrg Mang Yan 



Increasing time-to-market pressures in recent years have resulted in a 
deterioration of the quality of software entering the system test phase. At 
HP*s Kobe Instrument Division, the software quality assurance process was 
reengineered to ensure that released software is as defect-free as possible. 



T 

JLhe 



.he Hewlett-Packard Kobe Instrument Division (KID) develops 
measurement instruments. Our main products are LCR meters and network, 
spectrum, and impedance analyzers. Most of our software is built into these 
instruments as firmware. Our usual development language is C. Figure 1 
shows our typical development process. 
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Given adequate development time, we are able to include sufficient software 
quality assurance activities (such as unit test, system test, and so on) to provide 
high-quality software to the marketplace. However, several years ago, time-to- 
market pressure began to increase and is now very strong. There is no longer 
enough development lime for our conventional process. In litis article, we 
describe our perceived problems, analyze the causes, describe eoiuitenneasures 
that we have adopted, and present the results of our changes. 



Figure I 

Hewlett-Packard Kobe Instrument Division software development process 
before improvement 
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Existing Development Process 

The software development process that we have had in 
place since 1986 includes the following elements: 

• Improvement in the design phase. We use struc t u r e d 
design methods such as modular decomposition, we use 
defined coding conventions, and we perform design 
reviews for each software module. 

■ Product series strategy. The concept of the product 
series is shown in Figure 2. First, we develop a plat- 
form product that consisls of newly developed digital 
hardware and software. We prudently design the plat- 
form to facilitale efficient development of the next and 
succeeding products. We then develop extension prod- 
ucts thai reuse the digital hardware and software of the 
pfelfonn product. Increasing the reuse rate of the soft- 
ware in this way contributes to high soft ware quality. 

• Monitoring I he defect curve. The defect curve is a plot 
of the ciunulative number of defects versus testing time 
(Figure 3). We monitor (his curve from the beginning 
of system test and make the decision to exit from 

the system test phase when (he curve shows sufficienl 
convergence. 

As a result of the above activities, our products' defect 
density (the number of defects within one year after ship- 
ment per thousand noncommenl source statements) had 
been decreasing. In one product, less than five defects 
were discovered in customer use. 

Perceived Problems 

Strong time-to-market pressure, mainly from consumers 
and competitors, has made our development period and 
the interval between projects shorter. As a result, we 
have recognized two significant problems in our products 



Figure 2 

The product series concept increases the software reuse 
rate, thereby increasing software quality. 
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Figure 3 

Typical defect curves. 
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and process: a deterioration of software quality and an 
increase in maintenance and enhancement costs. 

Deterioration of software quality. In recent years (1995 
to 1997), software quality has apparently been deteriorat- 
ing before the system (esl phase. In our analysis, this phe- 
nomenon is caused by a decrease in the coverage of unit 
and integration testing. In previous years, R&D engineers 
independently executed unit and integration testing of the 
functions that they implemented before the system test 
phase. At present, those tests are not executed sufficient- 
ly because of the shortness of the implementation phase 
under high time-to-market pressure. Because of the 
decrease in test coverage, many single-function defects 
(defects within the range of a function, as opposed to 
combination-function defects) remain in the software at 
the start, of system test (Figure 4). Also, our system test 
periods are no longer as long. We nearly exhaust our test- 
ing time to detect single-function defects in shallow soft- 
ware areas, and we often don't reach the combination- 
function defects deep within the software. This makes 
it less likely that we will get convergence of the defect 
curve in the limited system test phase (Figure 5 ). 

Increase of maintenance and enhancement costs. 

For oiu" measurement instruments, we need to enhance 
the functionality continuously to satisfy customers' re- 
quirements even after shipment. In recent products, 
the enhancement and maintenance cost is increasing 
(Figure 6). This cost consists of work for the addition of 
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Figure 4 

Change in the proportion ol single-function defects found 
m the system test phase. 
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new functions, the testing of new modified functions, and 
so on. In our analysis, this phenomenon occurs for the 
following reasons. First, we often begin to implement 
func tions when the detailed specifications are still vague 
and the relationships of functions are still not clear. 
Second, specifications can change to satisfy customer 
needs even in the implementation phase. Thus, we may 
have to implement functions thai are only slightly different 
from already existing functions, thereby increasing the 
number of functions and pushing the cost up. Figure 7 
shows that the number of functions increases from one 



Figure 5 

Detect curves for post- 1995 products. 
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Figure 6 

Increase in the cost per function of enhancement and 
maintenance. The first enhancements for Product B 
occurred in 1991 
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product to another even though the two products are 
almosl the same. 

Often the internal software Structure is not suitable for a 
particular enhancement. This can result from vague func- 
tion definition in the design phase, which can make the 
software structure inconsistent and not strictly defined. 
In the case of our combination network and spectrum 
analyzers, we didn't always examine all the relationships 
among analyzer modes and the measiiremenl and analyzer 
functions (e.g.. different display formats for network and 
sped rum measurement modes). 



Figure 7 

Increase in the number of commands in two similar 
analyzers as a result of changing customer needs. 
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Naturally, the enhancement process intensely disturbs soft- 
ware internal structures, which forces us to go through 
the same processes repeatedly and detect and fix many 
additional side-effect defects. 

Counter-measures 1 ■'' 

If we had enough development time, our problems would 
he solved. However, long development periods are no 
longer possible in our competitive marketplace. Therefore, 
we have improved the development process upstream to 
handle these problems. We have set up two new check- 
points in the development process schedule to make sure 
that improvement is steady (Figure 8). hi tliis section we 
describe the improvements. 

We plan to apply these improvement activities in actual 
projects over a three-year span. The software quality 
assurance department fSWQA) will appropriately revise 
this plan and improve it based on experience with actual 
projects. 

Design Phase — Improvement of Function Definition. We 

have improved function definition to ensure sufficient 
investigation of functions and sufficient testing to remove 
single-function defects early in the development phase. 



Figure 8 

Improved software development process. 
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We concisely describe each function's effects, range of 
parameters, minimum argument resolution, related func- 
tions, and so on in the function definition (Figure 9). 
I "sing tliis function definition, we can prevent duplicate 
or similar functions and design the relationships of the 
measurement modes and functions precisely. In addition, 
we can dearly define functions corresponding to the 
product specifications and clearly check the subordinate 
functions, so that we can design a simple and consistent 
internal software structure. We can also easily write the 
tesi scripts for the automatic lests, since all of the neces- 
sary information is in the function definitions. 

SWQA, not R&D. has ownership of the template for func- 
tion definition. SWQA manages and standardizes this 
template to prevent quality deterioration and ensure that 
improvements that have good effects are carried on to 
future projects. 

Checkpoint at the End of the Design Phase. The first 
new checkpoint in the development process is at the end 
of the design phase. SWQA confirms that all necessary 
information is contained in the function definitions. SWQA 
approves the function definitions before the project goes 
on to the implementation phase. 

Implementation Phase— Automatic Test Execution. Ill 

this phase, SWQA mainly writes test scripts based on the 
function definitions for automatic tests to detect single- 
function defects. We use equivalence partitioning and 
boundary value analysis to design lest scripts. As for 
combination-function defects, since the number of combi- 
nations is almost infinite, we write test scripts based only 
on the content of the function definitions. When we im- 
plement the functions, we immediately execute the auto 
malic- tests by using the scripts corresponding to these 
functions. Thus, we confirm the quality of the software as 
soon as possible. For functions already tested, we re- 
execute the automatic tests periodically and check for 
side effects caused by new function implementations. As 
a result of these improvements, we obtain software with 
no single-function defects before the system test phase, 
thereby keeping the software quality high in spite of the 
short development period. The test scripts are also used 
in regression testing after shipment to confirm the quality 
of modified software in the enhancement process. In this 
way. we can reduce maintenance and enhancement costs. 
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Figure 9 

An example of the improvement in function definition, (a) Before improvement (bl After improvement 
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Checkpoint at the End of the Implementation Phase. Al 

the second new checkpoint in the development process. 
SWQA confirms thai the lest scripts reflect all the content 
of the function definitions, and that there are no signifi- 
cant problems in the test results. The project cannot go 
on to the system test phase without this confirmation. 

System Test Phase— Redefinition of System Testing. 

In an ideal testing process, we can finish system testing 
when we have executed all of the lest items in the test 
cases we have written. However, if many single-function 
defects are left in the software at the start of system test, 
we will detect single-function and combinal ion-function 
defects simultaneously, and the end of testing will become 
unclear. Therefore we use statistical methods, such as 
convergence of the defect curve, to decide when to end 
the system test phase. 

In our improved process, we can start the system test 
phase wilh high-quality code that includes only a few 
single-function defects. Thus, we can redefine the testing 
method to get more efficiency in detecting the remaining 
delects. We divide the system test items into two test 
groups. The first group uses black box testing. We write 
these test cases based on the instrument characteristics 
as a system and on common failures thai have already 
been detected in the preceding series products. The 
second group is measurement application testing, which 
is known as white box testing. The R&D designers, who 
clearly know the measurement sequence, test the mea- 
surement applications according to each instrument's 
specifications. We try to decide Ihe end of system test 
based on the completion of test items in the test cases 
written by R&D and SWQA. We try not to depend on 
statistical methods. 

Checkpoint at the End of the System Test Phase. We use 

this checkpoint as in the previous process, as an audit 
point to exit the system test phase. SWQA confirms the 
execution of all test items and results. 

A Feasibility Study of Automatic Test 

Before implementing the improved development process 
described above, we wanted to understand what kind of 
function is most likely to cause defects and which parts 
we can't test automatically. Therefore, we analyzed and 
summarized the defect reports from a previous product 
series (five products). We found that the front-panel keys, 
the HP-IB remote control functions, and the Instrument 



BASIC language are most likely to cause defects. We also 
observed thai the front-panel keys and the display are 
difficult to test automatically. Based on this study, we 
knew which parts of the functions needed lo be written 
clearly on the function definitions, and we edited the test 
items and checklist to make the system test more efficient. 

Application of the Improvement Process 
Project Y. Product Y is an extension and revision of Prod- 
uct X, a combination network, spectrum, mid impedance 
analyzer. The main purpose of Project Y was lo change 
the CRT display lo a TFT ( thin-film transistor) display and 
the HP-IB printer driver to a parallel printer driver. Most 
of the functions of the analyzer were not changed. 

Since Product Y is a revision product, we didn't have to 
write new function definitions for the 1IP-1B commands. 
Instead, we used the function reference manual, which 
has the closest information to a function definition. The 
main purpose of the lest script was to confirm that eacli 
command worked without fail. We also tested some com- 
bination cases (e.g., testing each command with different 
channels ). The test script required seven weeks to write. 
The total number of lines is 20,141. 

For the automatic- tests, we analyzed the defect reports 
from five similar products and selected the ones related 
to the functions that are also available in Product Y (391 
defect reports in the system phase). Then we identified 
the ones thai could be tested automatically. The result 
was 140 reports, which is about 40% of the total. The 
whole process took three weeks to finish and the test 
script contains 1972 lines. The rest of Ihe defect reports 
were checked manually after the end of system test. 
It took about seven hours to finish this check. 

Both of the above test, scripts were written for an in-house 
testing tool developed by the HP Santa Clara Division. 3 
An external controller (workstation) transfers the 
command lo the instrument in ASCII form, receives the 
response, and decides if the test result passes or fails. 

Instrument BASIC (IBASIC). the internal instrument con- 
trol language, has many different functions. It comes with 
a suite of 295 test programs, which we executed automati- 
cally using a workstation. The workstation downloaded 
each test program to the instrument, ran the program, and 
saved the result. When all the programs finished running, 
we checked if the result was pass or fail. 
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For all of the automatic testing, we used the I NK ' make 
command to manage the test scripts. The make command 
let each test program execute sequentially. 

I 'sing the test scripts, we needed only half a day to test all 
of the HP-IB commands and one day to test the IBASIC'. 
Since Product Y is a revision product, we also used tin- 
test scripts to test its predecessor. Product X. to confirm 
that Product Y is c ompatible with Product X. The test 
items in the Produc t X checklist were easily modified to 
test Product Y. 

Project Z. Product Z belongs to the smite product series 
as Product Y (a combination network, spectrum, and 
impedance analyzer ). The reuse rate of source code is 
77% of Product Y. 

One R&D engineer took one month to finish the first draft 
of the function definitions. To test the individual HP-IB 
commands, since the necessary function definition infor- 
mation existed, we easily modified the test script for 
Product Y to test Product Z. We employed a third-party 
engineer to write the test scripts. This took five weeks. 

Since Produc t Z is in the same series as Produc t Y. we are 
reusing the test scripts for Product Y and adding the new 
test scripts corresponding to the new defec ts that were 
detected in Product Y to test Produc t Z. 

The IBASIC is the Same as Product Y's, so we use the same 
lest program for Product Z. The automatic test environ- 
ment is also the same as for Product Y. 

Since Product Z is still tinder development, we don't have 
the final results yet. We use the test scripts to confirm the 
individual HP-IB commands periodically. This ensures that 
the quality of the instrument's software doesn't degrade 
as new func tions are added. At this writing, we haven't 
Started system test, but we plan to reuse the same product 
series checklist to test Product Z. 

Results 

Project Y. In this project, we found 2'1 mistakes in the 
manual. 68 delects in Product X while preparing the lest 
scripts, and 53 defec ts in Product Y during system test. 
Th<' following table lists the total lime spent on testing 
and the numbers of defects thai were detec ted in Product 
X in Project X arid Project Y. 



Table I 

Defects f<m tut in Hrtuluct X 

Project X Project Y 

Testing Time (hours) 1049 200 

Number of Defects :««' ss 



According to this data, using the test sc ripts based on the 
function reference manual, we detected 88 defects in 
Product X during Projec t Y. even though we had already- 
invested more than 1000 test hours in Project X and the 
defect curve had already converged (Figure 3). We con- 
clude that testing the software with a test script increases 
the ability to detect defects. Also we see that a function 
definition is indispensable for writing a good test script. 

Since the automatic test is executed periodic ally during 
the implementation phase, we can assume t hat no single- 
function defects remained in Product Y's firmware before 
system test. Since Product Y is a revision product, there 
were only a few software modifications, and we could 
assume that the test items for the system testing covered 
all the modified cases. Therefore, we could make a deci- 
sion to stop the system test when all the test items were 
completed, even though the defect curve had not con- 
verged (Figure 10). However, for a platform product or 
an extension product that has many software mollifica- 
tions and much new code, the test items of the system 
lest are probably not complete enough to make this deci- 
sion, mid we will still have to use the convergence of the 
defect c urve to decide the end of the system test. Never- 
theless, it will always be our goal to make the lesl items 
of the system test complete enough that we can make 
decisions in the future as we did in Project V. 

The test script is being used for regression testing during 
enhancement of Product Y to prevent the side effects 
caused by software modifications. 

In Figure 11. we compare the test time and the average 
defect detection time I'm these two projects. Because 
Product Y is an extension of Product X. the results are 
not exactly comparable, but using the lest script appears 
to be better because it didn't lake as much time to delect 
the average defect. 
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Figure 10 

Defect curve for Project Y. 




0 20 40 60 80 100 120 

Test Hours 



We needed lime to write the lest scripts, but the system 
tesl phase became shorter, so the total development time 
was shorter for Project Y. The enhancement cost will be 
lower because we can reuse the same test script for 
regression testing. 



Project Z. We expect that the quality of Product Z will be 
high before system lest because we lest Product Z periodi- 
cally in the implementation phase and confirm the result 
before entering system test. 

The additional work of the improvement process is to 
write formal function definitions and tesl scripts. Since 
this project is the first to require a formal function defini- 
tion, it took the B^D engineer one month to finish the 
firs! draft. For the next project, we expect thai the func- 
tion definition can be mostly reused, so ihe time needed 
to write it will be shorter. 

The test scripts are written during Ihe implementation 
phase and do not affect the progress of the project. There- 
fore, we only need to wail about a month for writing the 
function definition before starling Ihe implementation 
phase, and since the time needed for system test will be 
shorter, the whole development process will be faster. 

Since we are reusing the test scripts of Product Y, the 
time for writing test scripts for Product Z is two weeks 
shorter than for Product Y. Thus, for a series product, we 
can reuse the test scripts to make the process faster. Also, 
making test scripts is not a complicated job, so a third- 
party engineer can do it property. 
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Conclusion 
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We analyzed the software (firmware) development prob- 
lems of the Hewlett-Packard Kobe Instrument Division 
and decided on an improvement process to solve these 
problems. This improvement process has been applied to 
two projects: Project Y and Project Z. The results show 
that we can expect the new process to keep the software 
quality high with a short development period. The main 
problems — deteriorating software quality and increasing 
enhancement cost — have been reduced. 
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level 3. 12 
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With the addition of a compiler, HP VEE programs can now benefit from 
improved execution speed and still provide the advantages of an interactive 
interpreter. 
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• his article presents the major algorithmic aspects of a compiler lor lite 
Hewlett-Packard Visual Engineering Environment ( HP VEE). HP VEE is a 
powerful visual programming language that .simplifies the development of 
engineering test-and-measurentent software. In the HP VEE development 
environment , engineers design programs by linking visual objects (also called 
devices) into hlock diagrams. A simple example is shown in Figure 1. 
Features provided in HP VEE include: 

■ Support for engineering math and graphics 

■ Instrument control 

■ Concurrency 

■ Data management 

■ GUI support 

■ Test sequencing 

■ Interactive development and debugging environment. 

Beginning with release 4.0, HP VEE uses a compiler to improve the execution 
speed of programs. The compiler translates an IIP VEE program into 
byte-code thai is executed by an efficient interpreter embedded in HP VEE. By 
analyzing the control structures and data type use of an IIP VEE program, the 
compiler determines the evaluation order of devices, eliminates unnecessary 
run-time decisions, and uses appropriate data struct tires. 

The HP VEE 4.0 compiler increases the performance of computation-intensive 
programs by about 40 times over previous versions of HP \T3E. In applications 
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Figure 1 

A simple HP VEE program to compute the area of a circle. 
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where execution speed is constrained by Instruments, file 
input and output, or display update, performance typically 
increases by 150 to 400 percent. 

The compiler described in this article is a prototype devel- 
oped by HP Laboratories to compile HP VEE 3.2 pro- 
grams. The compiler in HP VEE 1.0 (Lifers in some de- 
tails. The HP VEE prototype compiler consists of five 
components 

■ Graph Transformation. Transformations are performed 
on a graph representation of the HP VEE program. The 
transformations facilitate future compilation phases. 

■ Device Scheduling. An execution ordering of devices 
is obtained. The ordering may have hierarchical ele- 
ments, such as iterators, that are recursively ordered. 
The ordering preserves the data How and control flow 
relationships among devices in the HP VEE program. 
Scheduling does not, however, represent the run-time 
How branching behavior of special devices such as 
If/Then/Else. 

■ (iuard Assignment. The structure produced by schedul- 
ing is extended with constructs that represent run-time 
flow branching. Each device is annotated with boolean 
guards that represent conditions that must be satisfied 
at run time for the device to run. Adjacent devices with 
similar guards are grouped together to decrease redun- 
dancy of run-time guard processing. Guards can result 



from explicit HP VEE brandling constructs such as 
If/Then/Else, or they can result from implicit properties 
of other devices, such as guards that indicate whether 
an iterator has run at least once. 

■ Type Annotation. Devices are annotated with type infor- 
mation that gives a conservative analysis of what types 
of data are input to. and output from, a device. The an- 
notations can be used to generate type-specific code. 

( ode < feneration. The data structures maintained bv the 
compiler are traversed to generate target code. 'Hie 
prototype compiler can generate (' code and byte-code. 
However, code generation is relatively straightforward to 
implement for most target languages. 

These components combine to implement the semantics 
explicitly and implicitly specified in an HP VEE program. 



Online Inloitnntioii 



This complete article can lie found at: 
http://www.hp.com/lipj/98may/ma98al3.htm 

More information about HP VEE can be found at: 
ht t p://ww w.hp.com/go/H PVEE 




May 1998 • The Hewlelt-Paehard Journal 



© Copr. 1949-1998 Hewlett-Packard Co. 



□ 



The Hewlett-Packard Journal 

The Hewlett-Packard Journal is published by the Hewlett-Packard Company to recognize technical contributions made by Hewlett- 
Packard (HP) personnel. While the information found in this publication is believed to be accurate, the Hewlett-Packard Company dis- 
claims all warranties of merchantability and fitness for a particular purpose and all obligations and liabilities for damages, including but 
not limited to indirect, special, or consequential damages, attorney's and expert's fees, and court costs, arising out of or in connection 
with this publication. 





Subscriptions 

The Hewlett-Packard Journal is distributed free of charge to HP research, design, and manufacturing engineering personnel, as well as 
to qualified non-HP individuals, libraries, and educational institutions. 

To receive an HP employee subscription send an e-mail message indicating your HP entity, employee number, and mailstop to: 
ldc_litpro@hp0000.hp.com 

Qualified non-HP individuals, libraries, and educational institutions in the U.S. can request a subscription by going to our website and 
following the directions to subscribe. 

To request an International subscription locate your nearest country representative listed on our website and contact them directly for a 
subscription. Free subscriptions may not be available in all countries. 

Back issues of the Hewlett-Packard Journal can be ordered through our website. 



Our Website 

Current and recent issues are available online at http://www.rip.com/hpj/journal.html 



Submissions 

Although articles in the Hewlett-Packard Journal are primarily authored by HP employees, articles from non-HP authors dealing with 
HP-related research or solutions to technical problems made possible by using HP equipment are also considered for publication. 
Before doing any work on an article, please contact the editor by sending an e-mail message to: hpJournal@hp.com 

Copyright 

Copyright 1998 Hewlett-Packard Company. All rights reserved. Permission to copy without fee all or part of this publication is hereby 
granted provided that 1) the copies are not made, used, displayed, or distributed for commercial advantage; 2) the Hewlett-Packard 
Company copyright notice and the title of the publication and date appear on the copies; and 3) a notice appears stating that the copying 
is by permission of the Hewlett-Packard Company. 



Inquiries 

Please address inquiries, submissions, and requests to: 



Editor 

Hewlett-Packard Journal 

3000 Hanover Street, Mail Stop 20BH 

Palo Alto, CA 94304-1 185 U.S.A. 



I HEWLETT-PACKARD 

Journal 

MAY 1998 • Volume 49, Number 2 

Technical Information from the Hewlell-Packard Company 



FR i HEWLETT-PACKARD LITPRO BLDG 190 
ENT i 0000 M/S: 20BBA 
TOi KAREN R LEWIS 
CORPORATE OFFICES 



SEL : JOURNAL - USA 535992 



5966-21 30E 



© Copr. 1949-1998 Hewlett-Packard Co. 



